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Description i 

[0001] The present invention relates to a neural network, and more particularly to a neural network that can be trained 
to learn a linearly Inseparable input pattern and output values therefor (the combination of such a linearly inseparable 
s input pattern and output values therefor will be referred to as an "input/output pattern"), and to a leaming method for 
said neural network. 

[0002] Various models for neural networks have heretofore been proposed. It Is known that the perceptron is the 
most basic neural network model. The perceptron is based on the learning process in which only the weight of asso- 
ciation (association coefficient) between Input and output teyers is varied. The leaming process of the perceptron is 
10 advantageous in that it is simpler than other leaming processes. The leaming process is a process in which the weight 
is varied to get the output to approach a target output. 

[0003] However, the perceptron is has a problem In that it cannot be trained to leam linearly inseparable patterns. 
If a pattern that can be classified into two kinds is linearly separable, it can be separated into the classifications in a 
straight line when it is two-dimensional and in a multidimensional plane when it is three-dimensional or more. 
75 [0004] The problem of the perceptron in that linearly inseparable pattems cannot be learned can be solved by a 
process known as back propagation. The back propagation is applicable to a multilayer neural network model, and 
allows linearly inseparable pattems to be learned. The back propagation basically sen/es to minimise the square func- 
tion of an error between a target output and an actual output, and uses an optimization process referred to as the 
method of steepest descent. Therefore, if there is a local minimum present in an error function obtained from given 
input/output patterns, then a learning failure may result when trapped in the local minimum. The back propagation Is 
also disadvantageous in that it is cumbersome to adjust parameters and initialize the weight, and difficult to determine 
the number of necessary neurons of a hidden: layer between input and output layers, and the process requires a large 
amount of calculation and is time-consuming. 

[0005] There have also been proposed algorithms for adding neuron units needed to construct a multilayer neural 
25 network. 

[0(K)6] One of the proposed algorithms seives to determine the number of neuron unit layers of a feedforward hier- 
archical network and the number of neuron units in each of the neuron unit layers. It adds neuron unit layers or neuron 
units in neuron unit layers until convergence: is reached. For details, see The Upstart Algorithm: A method for Con- 
structing and Training Feedforward Neural Networks, written by Marcus Frean, Neural Computation. Vol. 2, pages 1 98 

30 - 209. 1 990, Massachusetts Institute of Technology. 

[0007] According to another proposed algorithm, neuron units are added as required according to predetermined 
rules in order to build a multilayer perceptron composed of linear threshold units. For details, see Learning in feedfor- 
ward layered networks: the tiling algorithm, written by Marc M6zard and Jean-Pierre Nadal, Journal of Physics A: Math. 
Gen. Vol. 22. pages 2191 - 2203, 1989. 

35 [0008] The above proposed algorithms add necessary neuron units until a desired output is obtained with respect 
to a given input. Therefore, neuron layers and neurons thereof which are required are not determined until a neural 
network is finally constructed, and the number of neuron layers used and the number of neurons used tend to be large. 
As a result, the resultant neuron networks are liable to be complex in structure, and do not lend themselves to high- 
speed processing operation. 

40 [0009] It is an object of the present invention to provide a neural network which solves the problem of linear insep- 
arability of the perceptron and the problem of being trapped in a local minimum of back propagation, and which can 
learn a linearly inseparable pattern with as few neurons as possible, and also a learning method for automatically 
generating hidden neurons necessary for such a neural network. 

[0010] According to the present invention, a neural network is provided as set out in claim 1. 
45 [001 1] According to the present invention, a learning method for a neural network is also provided as set out in claim 5. 
[001 2] The hidden neurons of the hidden layer between the input and output layers are automatically generated, so 
that a linearly inseparable input/output pattern is divided into linearly separable input/output patterns, which are then 
combined together. As a whole, the neural network is capable of leaming input/output pattems that are linearly insep- 
arable. Since the input and output layers are provided in advance, and only a minimum number of hidden neurons or 
a number of hidden neurons close thereto are determined, the neural network is highly versatile, and can operate at 
high speed for information processing. 
[0013] Experimental results, described later on, indicate that the learning method was able to learn all input/output 
pattems with four inputs and one output. The learning method had a learning speed about 500 times faster than the 
learning speed of the back propagation process. Pattems whose inputs are all 0 can be leamed by expanding or 
55 modifying the basic arrangement of the neural network and the learning method according to the present invention. 
[0014] Some embodiments of the invention will now be described by way of example and with reference to the 
accompanying drawings, in which 



so 



EP 0 521 729 B1 



FIG. 1 is a block diagram illustrating the fundamental concept of the present invention; 

FIGS. 2(a) through 2(f) are diagrams showing a principle for transforming a linisarly unseparable pattern into a 
linearly separable pattern; 

FIG. 3 is a diagram of a neural network according to an embodiment of the present invention; - 
s FIG. 4 is a diagram showing a linearly separable allocation process; 

FIGS. 5(a) and 5(b) are diagrams showing the positional relationship between two pattern sets in a state variable 
. space of input neurons; 

FIGS. 6(a) and 6(b) are diagrams illustrative of transformation of a linearly unseparable pattern; 

FIGS. 7(a) and 7(b) are diagrams showing transformation functions for two kinds of patterns; 
10 , FIG. 8 is a flowchart of a learning method according to an embodiment of the present invention; 

FIG. 9 is a flowchart of a transfomnation routine in the flowchart of FIG. 8; 

FIGS. 10(a) through 10(c) are diagrams showing a pattern transformation for speeding up algorithms; 

FIGS. 11 and 12 are diagrams showing all algorithms in detail; 

FIG. 1 3 is a diagram of definltionis of variables shown in FIGS. 11 and 12; 
IS FIGS. 14 through 16 are a flowchart of an algorithm y used in the learning method; 

FIG. 17 is a flowchart of an algorithm shown in FIG. 16; 

FIG. 18 is a diagram of a neural network with bias neurons; 

FiG. 19 is a diagram showing a bias neuron unit with a self feedback loop; 

FIG. 20 is a diagram of a neural network with multiple inputs and multiple outputs; 
20 FIG. 21 is a diagram of a multilayer neural network having a plurality of hidden layers; 

FIG. 22 is a logarithmic graph of calculating times of the learning method of the present invention and the back 
, propagation method; and 

FIG. 23 is a graph showing percentages of correct answers according to the learning-method of the present in- 
vention and the back propagation method with respect to the number of patterns. 

2S 

[001 5] FIG: 1 shows the fundamental concept of the present invention. According to the present invention, as shown 
in FIG. 1 , a linearly inseparable input/output. pattern P is divided into linearly separable pattems Q^, Qa* " • ^ 
combination pattern R for combining the linearly separable pattems , Qg, • is generated. The llneariy separable 
pattems Q^, Q2, On are implemented by hidden neurons, and the combination pattern R is implemented by output 

30 neurons, so that the linearly inseparable input/output pattem P can be teamed as a whole. 

[0016] According to the present invention, a neural network having input, output, and hidden layers comprises a 
lower neural network model composed of hidden layer neurons and Input layer neurons for learning the linearly sep- 
arable patterns Q^, Q2, • Qn. and a higher neural network model composed of hidden-layer neurons and output layer 
neurons for cornbining the linearly separable patterns Q-,, Q2. — , into the combination pattem R. 

35 [0017] The learning method according to the present invention effects a predetermined learning process on a feed- 
forward neural network for.generating a plurality of linearly separable patterns from a given linearly inseparable pattern. 
[0018] The neural network and the learning method therefor will hereinafter be described in greater detail. 



40 



(1 ) Formation of a neural network: 



[0019] Any optional input/output pattem including a linearly inseparable pattern P can be represented by a combi- 
nation of linearly separable patterns Q-,, Qg. Q^. 

[0020] For example, as shown in FIG. 2(a), two patterns indicated respectively by black dots (ON) and white dots 
(OFF) are presented in a multidimensional conceptual coordinate system that is conceptually represented by two- 
45 dimensional coordinates. The issue here is whether these pattems are linearly separable. 

.[0021] Now, straight lines a, b, c, d as shown in FIG. 2(b) are introduced into the pattems. The patterns shown in 
FIG. 2(a) can be expressed by sum and difference sets in regions indicated by respective upwardly directed arrows 
on the straight lines. 

[0022] The characters typed in bold represent vectors in the following description. 
so [0023] The straight lines a, b, c, d are represented as follows: 

a:w^\=e. 



55 



b : Wq^x = e. 
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c : Wc^x= e, 



d : Wp^x = e 



where w is a association weiglit vector, x is a state value vector of input neurons, ^ represents transposition, and 0 is 
a threshold. In FIG. 2(b), >0,Wb> 0, Wq > 0, Wp > 0, and 9 > 0. 

[0024] As shown in FIGS. 2(c) through 2(f), the sets of the black dots in the regions indicated by the upwardly directed 
10 arrows on the straight lines are contained In the following regions: 

Set of black dots in FIG. 2 (c) : w^^x > G; 
Set of black dots in FIG. 2 (d): Wg^x > 8; 
Set of black dots in FIG. 2 (e): w^^x > 9; 



and 

Set of black dots in FIG. 2 (f) : yN^x > 9. 

[0025] These regions or sets are indicated by A, B, C, D, respectively If an operation to determine a sum set is 
indicated by a symbol <+> and an operation to determine a difference set is indicated by a symbol <->, then the two 
types of patterns that are separated as shown in FIG. 2(b) are expressed as follows: 

{(A <+> B) <+> C} <-> D. 



[0026] Whether the variable vector x in the regions A, B, C, D is contained in the regions or not can be indicated by 
whether the values of Liyi^Ji^ - 9), L(WbTx - 0), MWc^x - 9), MwpTx - 9) are 1 or 0. where L is a threshold function that 
3S is 1 or 0 depending on a variable z as follows: 

when z > 0, L(z) = 1 , and 
when z < 0, L(z) = 0, 

[0027] If any of the above values Is 1, then the variable vector x is present in the region indicated by w. 
[0028] If it is assumed that the above values are expressed as follows: 



x^ = L(w^^x -9), 



4S Xb = L(Wb\ - 9). 



Xc = L(Wc*^x - 9), 

so and 

= MWd^ - 9), 

55 then the patterns shown in FIG. 2(b) can be expressed by the following equation: 

y = L((9 + e)x^ - 2eXB + 2eXc - 2eXD - 9) 
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where e is a positive number and e < G. 

[0029] The coefficients, indicated by 9 and e, of x^i Xg, Xq, Xq may not strictly be of the values given by the above 
equations, but may be of any values or, broadly stated, may not be defined by the above equations, insofar as the 
Input/output relationship between X;^, x^, Xc, Xq and y is the same as the above equations. 
s [0030] y has a value of 1 with respect to the black dots in FIG. 2(a) and a value of 0 with respect to the white dots 
in FIG. 2(a). In this manner, even if the original input/output pattern is linearly inseparable, it can be transformed into 
linearly separable patterns, and can be expressed by neural network models. 

[0031] The above process generalized by replacing x^^, Xb. Xq, Xq with x^, x^, x^, x^, respectively, and rewriting the 
above equations as follows: 

10 : 

= L(w^\ -9), 



IS 



20 



and 



x^ = L(Wb"'x.9). 



3 T 

X = L(Wq x - 9), 



x^ = L(Wd'^x - 9), 



2ex^ + 2 ex^ - 2ex'* - 9) 
If necessary, x^, x®, x^, ••• may be added. 

[0032] The above general equations apply to a three-layer neural network having an input layer 1 . a hidden layer 2. 
30 \ and an output layer 3 as shown in FIG. 3, and indicate the relationship between state values x^, x^, x3, ^ of neurons' 
of the hidden layer 2 and the state value y of neurons of the output layer 3. The input layer 1 is composed of a plurality 
of n neurons whose state values are indicated by x^, X2, • x,,, and the output layer 3 is composed of a single neuron 
whose state value is indicated by y These neurons are given in advance to the layers. 

[0033] In FIG. 3, association coefficients between the hidden layer 2 and the output layer 3 are determined such that 
3S the original input/output pattern is realized by a combination of linearly separable patterns realized by the neurons of 
the hidden layer 2. Specifically, the association coefficients between the hidden layer 2 and the output layer 3 are 
determined such that, using the positive number e, the sum (9 + e) of association coefficients betvyeen output neurons 
.arid first through odd-numbered hidden' neurons is greater than the threshold 9, and the sum (9 - e) of association 
coefficients between output neurons and first through even-numbered hidden neurons is smaller than the threshold 9. 
40 ' [0034] The threshold function L(w'rx -9) as a basis for the above mathematical model is known as the McCulloch- 
Pitts neuron model. Since the threshold function L(w^x- 9) can be implemented by hardware or software, the neural 
network model shown in FIG. 3 can also be realized. 

(2) Formation of linearly separable patterns: 

45 

[0035] In the neural network wherein the association coefficients (weights) between the higher neurons, i.e., between 
the hidden and output layers, are determined as described above in (1), association coefficients indicated by wij, i = 
1 , 2. — , n; j = 1. 2. k In FIG. 3) between lower neurons, i.e., between the Input and hidden layers, are determined 
according to a learning process, described below, for generating linearly separable patterns from a linearly inseparable 
so pattern. The linearly inseparable pattern is then transformed into linearly separable patterns by the neural network 
model with the association coefficients thus determined. * ' 

[Learning process] . 

55 [0036] If target and actual outputs are the same as each other, the weights are not altered. 
[0037] If target and actual outputs are different from each other, then using a indicated by: 



25 



y = L((9 + e)x' - 
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5 the weights are altered according to the following equation: 

d+1 d * 
w = w + a*x 

10 [0038] Having effected the above learning process, it is checked whether the alteration of the association coefficients 
is 0 or not with respect to all input patterns. If the alteration of the association coefficients is 0 and there is a 
pattern with respect to which target and actual outputs are different from each other, then it is deternnined that a linearly 
inseparable pattern has been learned. If this is the case, then any subsequent patterns with respect to which target 
and actual outputs are different from each other are stored, obtaining a set of linearly inseparable elements called a 

IS "linearly inseparable core pattern set" that serves as an origin of the linearly inseparable pattern. 

[0039] If a pattern to be learned is linearly separable, then it can be leamed according to the above learning process. 
If a pattern to be leamed is linearly inseparable, then a linearly inseparable core pattern set is obtained according to 
the above leaming process. Thereafter, one of the two patterns, e.g., an OFF pattern of the two ON/OFF pattems, is 
selected from the obtained linearly Inseparable core pattern set such that, for example, an input pattern with a maximum 

20 norm is selected. The transformation Is carried out by changing the selected OFF pattern to an ON pattern. 

[0040] Based on the newly obtained ON/OFF pattern, the leaming process is effected again, and the allocation is 
finished when a certain finishing condition (described later on) is satisfied. If no finishing condition is satisfied, then a 
linearly inseparable core pattern set is determined, and one of the OFF pattems thereof is changed to an ON pattern. 
The above process is repeated until all pattems are made linearly separable. 

25 [0041] A linearly separable pattern can be obtained according to the above leaming process. The result otthe process 
is shown in FIG. 2(c). To obtain a pattern as shown in FIG, 2(d), the difference between the pattem shown in FIG. 2 
(c) and the original pattern, i.e., the pattem shown in FIG. 2 (a) is used as a new original pattern, and the above leaming 
process is effected to check if the new original pattern is linearly separable or not. 

[0042] The aforesaid process Is executed until the difference between two patterns becomes a linearly separable 
30 pattern. Thereafter, the input/output pattems that have been given at first, including a linearly separable pattem, are 
leamed with the neural network, as shown in FIG. 3, including hidden layer neurons having weights corresponding to 
the patterns. FIG. 3 will be described in detail later on. 

[0043] The principle of the present invention is applicable not only to the three-layer neural network as shown in FIG. 
3, but also to a multilayer neural network having an input layer 1 , a plurality of hidden layers 2, and an output layer 3, 
35 as shown in FIG. 21 . In FIG. 21 , the neural network has three hidden layers 2, and neurons of the first and third hidden 
layers 2 as counted from the input layer 1 are generated as required as the learning process proceeds. Neurons of 
the second hidden layer 2 are provided in advance as output neurons (corresponding to the output neuron y in FIG. 
3) with respect to the neurons of the first* hidden layer 2 adjacent to the input layer 1 . FIG. 21 will also be described in 
detail later on. 

40 [0044] An embodiment of the present invention will now be described in greater detail by way of mathematical models, 
successively in topics as follows: 

(1) Linear separability; 

(2) Linearly separable allocation algorithm; . 

45 (3) Determination of separating hyperplanes; and 

(4) Overall algorithm. 

(1) Linear separability: 

50 [0045] The input/output relationship between neurons is expressed by: 

^out = - 9) (1) 

55 where the transformation function h is a continuous threshold function represented by: 

h(z) = 1 when z > 0. and 
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h(z) = 0when2<0 • (2) 

where x is a n-dimensional binary vector (Boolean vector), indicating the state value of input neurons, Xout ^ binary 
5 scalar, indicating the state value of output neurons, w e R" is an association weight vector of a synapse corresponding 
to the input vector x, and 9 e Is a neuron threshold in the form of a positive constant. 

[0046] The term "learning" denotes a process in which the association weights w are altered until the output x^^^^ of 
a network composed of a neuron model according to the equation (1) above has a desired value with respect to an 
input of m n-dimensional binary vectors x. . 
10 [0047] It is assumed that the set of m n-dimensional Input vectors x = (Xj) Is represented by X, a set of input vectors 
' whose target output is 1 Is referred to as an ON pattern set and indicated by Xqw, and a set of input vectors whose 
target output is 0 is referred to as an OFF pattern set and indicated by Xqff- The elements of these latter two sets are 
called respectively as "ON pattern" and "OFF pattern". The following assumption is provided for these ON and OFF 
patterns: 

IS Assumption a: Xqn n Xqff = * (enipty set). 

[0048] A process for determining w which satisfies the equations: 



20 



25 



35 



40 



45 



SO 



ss 



that is, the Inequalities: 



h (x"^W - 9) = 1 (X e Xqn) 

h(x'^W-9) = 0(xGXopp) (3), 



x'^w > e (X G Xon) ; . 

x'^W < 9 (X G Xopp) ; . (4) 



5^ is the teaming process. . 

[0049] It there is a solution w according to the above formulas, then there Is a hyperplane: 



x^W=9 (5) 

Which separates the pattern sets Xqn. Xqff ''^ ^ strong sense. At this time,.the sets Xqn. Xopp are said to be linearly 
separable. If not, the sets. Xqn. Xqff s^'c* to be linearly unseparable. 

[0050] In the learning of a binary neural network, only linearly separable patterns can be handled between two layers 
insofar as the the McCulloch-Pltts neuron model is employed. To learn linearly Inseparable patterns, therefore, it is 
necessary to use a neuron network of three or more layers, and to handle a linearly separable input/output relationship 
between the layers, so that a linearly inseparable input/output relationship can be achieved as a whole. It has already 
been proved that any optional input/output relationship can achieved by a neuron network of three or more layers. 
[0051] In considering a learning algorithm for a neuron network of three or more layers, it is important how to deter- 
.mine a teaching signal. 

[0052] According to the present embodiment, a pattern allocation process is employed to divide a linearly inseparable 
pattern into a plurality of linearly separable patterns and allocate and learn the linearly separable pattems between the 
input and hidden layers. 

(2) Linearly separable allocation algorithm: 

[0053] A suitable target output Is given to the hidden layer for a linearly inseparable pattern to be learned. To explain 
such a process, a learning pattern as shown in FIG. 4 is provided. 

[0054] FIG. 4 shows a linearly separable allocation by .way of example. Specifically, in a general x (vector) space, 
half spaces S^^qn (k = 1 , ♦ 5) are determined with respect to a given pattern to be learned. Spaces indicated by the 
arrows in FIG. 4 are representative of the regions of S*«on- 

[0Q55] If it is assumed that black dots of those denoted by a through i are ON patterns and white dots OFF pattems 
in FIG. 4, then the pattern sets Xqi^, Xqff given as follows: 
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Xqpp = {a. d, e, h}. 

[0056] Obviously, these pattern sets are linearly inseparable. If, however, hyperplanes (two-dimenslonally, straight 
lines) are introduced as shown in FIG. 4 and half spaces (indicated by the arrows) represented by these hyperplanes 
are expressed by: 

s'oN = {x'w'^x>0}, 

S^o^ = {xlw^'^x > 0} (6), 

then whether patterns x are contained in these half spaces can be determined by checking if the state values of neurons 
that are expressed by: 

20 x^ = h(w^\ - 0). 



IS 



2S 



30 



= h(w^\ - 0) (7) 

can be 1 or not. The half space S"»on realized by the first neuron xV While the target outputs (the separation of the 
OFF and ON patterns) of the patterns a. b, c, f, g, I are realized by x^ the separation of the OFF patterns d, e, h is not 
realized. To realize the separation of the OFF patterns d, e, h, a difference set S^qn " ^'^oh which is produced by 
subtracting S^q^^ from S^qn »s considered. 
[0057] The difference set S^qn ' ^^oh be determined by checking if the state value of a neuron expressed by: 

35 X., 2 = ^((^ + - 2ex^ - 8) 

according to the neuron model (1 ) is 1 or not. Although desired output values for the patterns a, b, c, d, e, h are obtained 
by x, 2, no satisfied outputs are obtained for the patterns f. g, i. To realize the patterns f, g, i, S^qn added, providing 
a set: 



40 



45 



so 



ss 



{^ON ' ^On)'-^^ON- 

The state value of a neuron expressed by: 

x^ 3 = h((e + e)x^ - 2ex^ + 2ex^ - 6) 
obtained with respect to the above set realizes all the patterns except h. 

[0058] If the above argument is applied to obtain desired outputs for all the patterns, then a neuron for giving a 
desired output y is expressed by: 

y = h((9 + e)x^ - 2ex^ + 2ex^ - 2ex'* + 2ex^ - 9) (8) 
where e > 0. The equations (7), (8) can be generalized respectively into: 
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= h(w*^^x - 0) (9) 

and. 

y = h((e + e)x^ - 2ex^ + 2ex^ • • - 6) (10) 
and can be achieved by the network as shown in FIG. 3. 

[0059] FIG. 3 shows the arrangement of the neural network according to the present ennbodiment. The hidden layer 
2 of the neural network shown in FIG. 3 has as many neurons (k neurons) as required by the algorithm. 
[0060] The weight coefficients between the output layer 3 and the hidden layer 2 have a value of 8 + e only between 
x^ — x^jy,, and have alternate values of - 2e and 2e for and subsequent to x^. 

[0061] Jn order to realize a linearly inseparable teaming pattern, there may be determined separating hyperplanes 
which alternately divide only ON patterns and only OFF patterns, as described above, with respect to the given learning 
pattem. 

[0062] Once the separating hyperplanes have been determined, the. association weight between the output neuron . 
y and the first neuron x^ oif the hidden layer is made larger than the threshold 6 by e as indicated by.the equation (8), 
and the association weights between the output neuron y and the second and following neurons x2, x^, are 
equalized to those having an absolute value of 2e with alternate sighs, in order to express the state in which the ON 
and OFF pattems are alternately divided by the hyperplanes. 

[0(ffi3] If the half plane S^q^^ shown in FIG, 4 were directed in the opposite direction, then only x^ is cancelled out 
by x2 in its ON region, and x^ 2 has the same value as x^ at all times. 

[0064] To prevent S^qn being determined in this way, the following relationship must be satisfied: 

. (S*'*'oN^X)c(s'^ON^X) .(11). 

[0065] In view of the above argument and the fact that the input.pattem set X is a Boolean set, the algorithm for the 
pattem allocation method can generally be described as follows: 
A^orithm a (Linearly separable allocation algorithm): 

[Step!] 

[0066] With the iteration (repetition) number being represented by k and an input vector set which will be described 
later on being represented by x^, it is first assumed that: 

XO = X (input pattem set). X^qn = (ON pattern set) , X1off=Xoff (^^^ pattern set) , and k = 1. 

[Step 2] 

[0067] X*^*^ is separated into a half space containing all elements of. X*<on ^ space containing at least one 
of only elements of X^off- Hyperplanes which satisfy the relationship (11), i.e., hyperplanes: 

kT . ' 

w x = e (12) 

which satisfy the following relationship: 



w X > 



ON' 

w^'x < e. 3x G X'^QPP, and 
{X G Xlw*'"^x > 0}c X*""^ \ . (13) 

are determined. 

[0068] The set of input vectors contained in half spaces on the X^^qim sides of these hyperplanes is represented by 
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X*< which is given by: 



k UT 

X ={x G Xlw x> 0} (14). 



[Step 3] 

[0069] If Xk = X^oN, then the algorithm is finished. If not. then 



yrk+l _ yk 

^ OFF ON (15). 

The iteration number k is set to k + 1 , and control goes to the step 2. 

[0070] The above algorithm attempts to separate the learning pattern X*^-'" into the sets X^q^^, X^^pp with the hyper- 
planes w'^^x = e in each iteration. If the separation is successful, then the algorithm is finished. If not. the learning 
pattern X^-^ is separated into a half space containing all elements of X^q,^ and a half space containing at least one of 
the elements of X^Qpp with the hyperplanes w^Tx = e. 

[0071] For such separation, the relationship (11) must be satisfied as described above. To meet this requirement, 
the conditional fonnula (1 3) which is equivalent to the relationship (11 ) is added to the algorithm. From this condition 
and the equation (14) results: 

2S x^*^ d x". 



30 



3S 



and the number of elements of X"* is reduced by at least one for each iteration. 

[0072] The half space separated from the set y^^m by the hyperplanes wkTx =e contains only the elements of Xk^pp. 
and thesa-elements ara separated from X^ by the hyperplanes. On the y^^^ sides of the hyperplanes, there exist all 
the elements of X^q,^ and the remaining elements of Xk^pp which are mixed together without being separated. The 
next iteration attempts to separate portions of X^qi^j, X^^pp which have not been separated in the present iteration. 
[0073] At this time, not only the ON and OFF patterns are simply separated from each other, but, as described with 
reference to FIG. 4, hyperplanes for alternately separating the ON and OFF patterns must be detemiined, and X^q^ 
must be expressed using sum and difference sets alternately as Indicated by the equation (10) above. To this end, 
target outputs for all the elements of the set X^^^^ and some elements of tile set X^opp, which have not been fully 
separated, are reversed, providing a learning pattern in the next iteration. 

[0074] The set X^o^ may be reversed by replacing X^^^ with X^+i Qpp as indicated by the second one of the equations 
(15). However, the reversal of the set X^Qpp cannot simply be effected. 

[0075] Since the elements of X^Qpp in the half space that does not contain X^q^ which is separated by the hyperplanes 
wkTx = e have already been only an OFF or ON pattern depending on the. iteration, these elements are separated from 
the other elements of X^ on the opposite sides of the hyperplanes. These elements are considered already divided 
patterns in FIG. 4. Under the condition (12) above, these pattems are not contained in the ON sides of the hyperplanes 
that are determined in a subsequent iteration, and their outputs remain unchanged. Therefore, the elements of X^off 
within the set X^ where the above pattems have been excluded from X are reversed. The result produced by subtracting 
X'^ON ^rom Xk is updated as Xk+^Qpp, as indicated by the first one of the equations (15). 
[0076] Through the above updating process. Xko^ and XkQpp are equal to those which are produced by reversing 
subsets of the original ON and OFF pattern sets in each iteration, as indicated by: 

so ^ ON ^ ^ON '® ^ number), 

X c Xqpp (k is an even number) (16) 



40 



45 



55 



where Xk corresponds to SkQ,^ in FIG. 4. and Xqw is expressed by: 

XoN = (X'-X^)^ ••^(X''"'-X*') 
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(k is an even number), 
(k is an odd number), 

[0077] The existence of the separating hyperplanes which satisfy the relationship (13) in the step 2 of the algorithm 
10 a is proved by the following [Theorem 1 ]: 

[Theorem 1]: 

[0078] Any single point in an n-dimenslonal Boolean vector set B" is separable from the remaining set In a strong 
IS sense. 

(Proof) 

[0079] An optional element is taken from the set B". The remaining set which Is left by taking from B" is denoted 
20 by U. It is checked whether x^ can be expressed by the convex association of the elements of the set U. 
[0080] It Is assumed thait 

U = {x,, Xg. Xq G B"}. 

25 

[0081] If x^ can be expressed by the convex association of the elements of the set U, then x^ is represented by the . 
following equations: 



30 




q 

i=i 



. (^li > 0, i = 1, - • q) . 

40 

[0082] The vectors x^, x^, X2, Xq have elements of 0 or 1 , and are all different from each other. Therefore, there 
exists j such that at least one of the elements x^j, Xgj, x^j of each vector has a value different from the other elements. * 
The element of the righthand side o^ the first one of the equation (17): 

45 ' ^ 

i-1 

so has a value of Xjj, i.e., 0 or 1 , if only one \i^^s 1 . However, if only one |ij is 1 , then since the vectors x^ x^, x^, — , x^ are 
all different from each other, the first one of the equation (17) is not satisfied. If two or more fij are nonzero, then in- 
order that the inequality: ' 



0 < X MiX^j < 1 .... (18) 

1-1 
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is not satisfied with respect to all j, but 
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has a value of 0 or 1 , the vectors with respect to ^| which is not 0 must all be the same. This is contrary to the fact 
that the vectors x^, Xg, • x^ are all different from each other. Therefore, there is no ^ij, i = 1 , q which would satisfy 
the equations (17). Consequently, x^ is not expressed by the convex association of the elements of the set U, and is 
not contained in the convex hull of the set U. 

[0083] It can be said from the above discussion that the convex hull of the set U does not contain x^. This fact, 
together with the following separation theorem, indicates that the convex hull of the set U and can be separated by 

hyperplanes. 

[Separation theorem] 



[0084] It is assumed that T and A are two convex sets which are not empty, with r being compact and A being closed. 
If the convex sets r and A do not Intersect with each other, then there exists a plane {xlx e R", c-x = a} (c ^eO) which ' 
20 separates the convex sets r and A from each other In a strong sense, and vice versa. Stated otherwise, the relationship- 



's 



rriA - 0 ^ 



3a ^ Q and 

a: X e r cjt < a 
X € A — ► cx > a 
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Is satisfied. 

[0085] The fact that the convex hull of the set U and x^ can be separated by hyperplanes is equivalent to the fact 
that the set U and x^ can be separated by hyperplanes. However, it Is generally not easy to determine the hyperplanes 
wkTx =e. Instead of directly determining the hyperplanes w^Tx =e, some of the OFF pattems of the original patterns 
are replaced with ON pattems to define half spaces for determining these hyperplanes. These patterns must of ne- 
cessity be linearly separable. According to the present invention, the above separating hyperplanes are obtained by 
giving these patterns to the hidden layer neurons for leaming. 

[0086] Inasmuch as a plurality of linearly separable patterns thus generated with respect to the original pattern are 
allocated to the hidden layer neurons for leaming. this process is called "Linearly Separable Allocation Method" which 
will be referred to as "LISA" in short. 

(3) Determination of separating hyperplanes: 

[0087] According to Theorem 1 , how separating hyperplanes which satisfy the conditions of the step 2 of the algorithm 
a are determined becomes the next problem (original problem), which Is equivalent to the detemilnation of w expressed 
by the relationship (4) above. 

[0088] It is assumed, for example, that in the event a given pattern is not linearly separable, a pattern set which 
makes the given pattern linearly separable is obtained. By replacing ON and OFF pattems of the pattern set thus 
obtained, the pattern as a whole can be made linearly separable. In the present invention, the pattern set which makes 
the given pattern linearly separable is called "Linearly Unseparable Core Pattern Set", which will be referred to as 
"LUCPS" in short. 

[0089] A "dual problem" which serves as a basis for defining LUCPS will hereinafter be described below, and then 
an "optimization" problem with respect to an original problem" will be derived for actually determining LUCPS. Further- 
more, the relationship between optimization conditions for the optimization problem and linearly unseparable core 
patterns, and the process of extracting LUCPS will also be described below. 

(3)-1 Dual problem: 

[0090] In a preparatory step, suffix sets Iqn. 'off ar© produced from the ON pattern set Xqn and the OFF pattern 



12 



EP0 521 729 B1 



set Xqfp as follows: 

'on ~ ^ ^on}» ^ 
Iqpp = {ilx. G Xopp}. 

[0091] Auxiliary Theorem 1 , given belowi Is applied to the original problem (4). deriving Theorem 2. 
(Auxiliary Theorem 1] 

[0092] Given a matrix A e R"»*n g vector b G R"», 

IS 1; there is a solution x G R" for Ax > b, or 

II. there is a solution y G R"* for A^y =0, b^y > 0, y > 0. However, the statements 1, 11 do not hold true at the same 
time, y >0 Indicates that all elements of y are 0 or more and at least one element is not 0. 
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(Proof) 

[0093] The statement I is equivalent to 



t'. there is a solution a solution x G R" G R) for ^ > 0, Ax >b^ If Gordon's theorem is applied to the statement 
r, then It is true or . . 

25 ir. there is a solution y for A^^y = O, (y"'",.bTy) >0. However, the statements 1', H' do not hold true at the same time. 

If y = 0, since b^ y = 0, 

(y'T, b'T y) > 0 has the same value as 

y'^> O. b'Ty) > 0; 

30 Therefore, either the statement I is true or the statement II is true. 

[Theorem II] ' . ' • 

[0094] With respect to 9 having a certain fixed positive value. - 

I. there is a solution w for ■ 
xTw >0, i G Iqn 

xTw <e, i G Iqff' or . . . . 

II. there is a solution A. for * 



S ^ X Xi, \ > 0 , •••(19) 



. l€IoK ielorr 

However, the statements I, II do not hold true at the same time. 
(Proof) 

[0095] A leaming pattern matrix V G R*^**^ is generated as follows: 



xi (i € Ion) 

Vi. - ( ...(20) 
- xi (i e To?r) 
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From vl, (i = 1, 2, m), a matrix V is produced as follows: 



10 



V - 



(21) 



Likewise, 



75 



G(i e Ion) 
- 9<i G Ioff) 



(22) 



20 



2S 



Then, the statement I is equivalent to 

r. there is a solution w for Vw >y. From Auxiliary Theorem 1 , the statement I' holds true, or 

\V. there is a solution X, for \J^X- 0, y^X >0,X>0. However, the statements I', II' do not hold true at the same time. 

[0096] Using the equation (21), the statement \V is rewritten as follows: 



30 
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X>0 



40 
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[0097] Furthermore, from the equations (20), (22). 

> 0 



(23) 



Since G is a fixed positive number, the second one of the equations (23) becomes: 



55 



ieZoM l€lorr 
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Therefore, the statement I is true or the statement II is true. 

[0098] Since the sojutions do not exist at the same time, the problem expressed by the equations (19) is called a 
dual problem for the original problem (4). The solution X to the dual problem (17) can be regarded as a positive asso- 
ciation coefficient determined such that linear combinations of ON, OFF pattem vectors are- equal to each other, and 
indicates that the sum of association coefficients with respect to ON patterns Is equal to or greater than the sum of 
association coefficients with respect to OFF patterns. 
[0099] The dual problem can be rewritten as follows: 



. 16 low 1€ Iprr 

0, c S 1 



(24) 



[0100] The equations (24) show that they have a solution \i when some of the convex associations of the ON pattem 
set Xqn are contained in a cone which is made up of the convex associations of the OFF pattern set y^oF^ and the 
origin, as schematically shown in FIG. 5. 

[0101] FIGS. 5(a) and^5(b) show the meaning of a dual problem in an x space. If convex associations of an ON 
25 pattem set aretontained in a cone made up of the origin and the convex associations of an OFF pattern set, then the 
dual problem has a solution and the original problem has no solution. . In FIG. 5(a). the dual problem has no solution, 
and the original problem has a solution. In FIG. 5(b). the dual problem has a splution. and the original problem has rio 
solution. 

[0102] It can be seen from FIGS, 5(a) and 5(b) that linear separability is determined by the relative positional rela- 
30 tionship between all patterns. However, as shown In FIGS. 6(a) and 6(b), with respect to a linearly Inseparable pattern 
set. an original pattern can be transformed into a linearly separable pattem by replacing some ON and OFF patterns 
with each other in the linearly inseparable pattem set. 

[0103] FIGS. 6(a) and 6(b) illustrate transformation of a linearly inseparable pattem. In FIG. 6(a), a cone made up 
of the origin and the convex associations of an OFF pattem set contain convex associations of an ON pattern set. 
3S When some (two upper left points) of the pattems of the OFF pattern set of the cone aire changed to ON patterns as 
shown in FIG, 6(b), the cone no longer contains any ON patterns. 

[0104] The fact that some convex associations of the ON pattem set Xqiv| are contained in the cone made up of the 
origin and the convex associations of the OFF pattem set Xqff FIGS. 5(a), 5(b), 6(a), and 6(b) indicates that there 
is a solution )j. to the problem (18). and the. solution p. corresponds to the solution Xto the problem (17). The solution 
40 X is an association coefficient in the case where the linear combinations of the elements of Xqn, Xqff ^^e the same 
as each other. If the components of the association coefficient X can be divided into 0 components and positive com- 
• ponents, then the positive components are considered to determine the solution to the dual problem. Thus, the positive 
components of X are involved in linear separability of the original problem. 

[0105] Generally, if the number of patterns is larger than the number of dimensions, then all the patterns are lineariy 
45 ' dependent, and the solution X to" the dual problem for most of the patterns can be positive. However, since the com- 
ponent value of a Boolean vector is 0 or 1 , the dimensions of a space defined by the pattems is smaller than patterns 
of continuous values. This does not apply if the number of learning patterns is close to the number of all combinations 
that can be possible with original input/output patterns, though such a case does not generally occur. ' 
[0106] If the positive component of a solution to a dual problem is determined, therefore, a linearly separable pattern 
so set X*^*^ determined in the step 2 of the algorithm a should be obtained by changing OFF patterns to ON patterns. 
Hidden layers for achieving, through a learning process, the linearly separable pattern set X*^*"* thus obtained, and 
* output neurons for combining those hidden layers are determined to reconstruct the original patterns according to the 
algorithfTi a. for thereby achieving the original lineariy separable patterns. 

[0107] Based on the foregoing, a "lineariy inseparable core pattem set" which is a pattern set as a basis for linearly 
ss inseparability is defined as follows: 
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[Definition 1] 

[01 08] A set Ilu of patterns with respect to nonzero elements of the solution X = [W to the dual problem (1 7) is defined 
as a linearly inseparable core pattern set (LUCPS) as follows: 

lLu={il^>0] 

[0109] It is possible to determine the solution to a dual problem by applying a condition A, > O to a sweeping-out 
method which checks if simultaneous equations are solvable or not. As the number of problem dimensions and the 
number of patterns increase, however, a combination explosion takes place. 
[0110] Consequently, a more practical method will hereinafter be described. 

(3) - 2 Optimization problem with respect to the original problem: 

[0111] 

[0112] An optimization problem to determine a linearly inseparable core pattern set Ilu is formulated below. 
[0113] For speeding up convergence, the following transformation functions are introduced into ON and OFF pat- 
terns: 



20 



2S 



= { 



1. 

1 + 2/ 

0, 



z > 0 
z < 0 

z < 0 



(25) 



30 



35 



40 



[0114] These functions are represented as shown in FIGS. 7 (a) and 7 (b) , and cannot be differentiated when z = 0. 
[0115] hoN 's provided by modifying the transformation function according to the equations (2), and a portion thereof 
where the argument is positive, i.e., the original problem is admissible is equal to h. A portion where the argument is 
negative, i.e., the original problem is inadmissible is linear. 

[0116] In sigmoid functions, a gradient exists only in the vicinity of a region where the argument is 0. In the above 
transformation functions, a constant gradient Is present at ail times in a portion where the original problem is inadmis- 
sible. Therefore, irrespective of the value of the association weight w*<, stable and quick convergence can be achieved 
in the learning process. For the same reasons, hopp expressed according to the second one of. the equations (25). 
[0117] The transformation functions ho^, hopp are allocated to the ON and OFF patterns, as described above, and 
the patterns are individually considered. By maximizing hoN- there are determined weights that realize the ON pattems, 
and by maximizing hopp- there are determined weights that realize the OFF patterns. The following optimization problem 
will be considered in order to determine the weights tor realizing the ON and OFF pattems: 



45 



max ( y 



(x'^w-O) 



- Z 

xeXorr 



<x-w^e) ) 



(26) 



50 



[0118] With the sums of hoN. - hopp being maximized, the solution search is conducted toward regions where the 
output is 1 with respect to the ON patterns, and also toward regions where the output is 0 with respect to the OFF 
patterns. If the input pattern is linearly separable, then the solution of the next problem gives an admissible solution to 
the original problem (4). The fomr^ula (26) is a maximization problem for. the function: 



55 



X6 XoM 



(27) 



Since hoN a concave function and hopp is a convex function, 0 is a concave function with respect to w, and the local 
maximum of 0 is equal to a global maximum. The problem (20) can be solved by a gradient method for an undiff erentiable 
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optimization problem. 

(3) - 3 Relationship between an optimization condition and a dual problem: 
5 [0119] 

[0120] An optimization condition Is determined from an optimum solution to the optlnrtization problem (20). and it will 
be indicated that the coefficient of the optimization condition is equal to the solution to a dual problem, it can be seen 
that if the original problem (4) is linearly inseparable, a pattem of nonzero (positive) coefficients of an optimization 
condition equation with respect to the optimum solution to the problem (20), i.e., LUCPS, can be determined. 

10 [0121] The optimization condition for the problem (20) will first be described below. 

[0122] The function $ is a partially undifferentiable function, and has no gradient In that It is undifferentiable. However, 
it has a hypogradlent or a general gradient under certain conditions. The definition of a hypogradlent and an optimization 
condition for an optimization problem will be described below. 
[0123]' In the optimization problem 

IS ^ • * 

max f (X) • • • (28) 

20 regarding x e R" of a concave function f , If the concave function f is undifferentiable with respect to Xq, the hypogradlent 
is a set of z vyhich satisfies: 

f (X) ^ f (Xq) + z^(x - Xq). Vx & X 

2S 

Using that set df (Xq), the optimization condition for the optimization problem is expressed by: 

Oe at (Xq) . . • 

30 

[0124] As the functions hojsj, - hoFF concave in a one-dimensional space R^. there is obviously a hypogradlent 
of (|>'in w £ intX. Using a hypogradlent set d0 of ()), the optimization condition for optimizing an undifferentiable concave 
function can be expressed as follows: 

36 ' Q . 

. 0 G d(Kw") . (29) 

where w° represents an optimum solution. " . 

If • • . . 

40 ' , 

. <t>ON(w;x)=ho^(xTw-e). 

and 

4S ' • ' ... . , 

*oFF(w;x)=hopp(xTw-e). 

then 

so 



55 
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3<t)(w) «= 

f Z ^<l>o>c(w;x) - 2 ^<^oFF(w;x) 



X6 Xcv xe XopF 



^<>on(w;x) € 3<1)on(^;x), 

^<S>off(v;x) g 9<|)off(w;x) } - • • (30) 



where 



V*on(w;x) = {0} (x'^w>e). 

"^^on (w;x) = co{0.x} (x^w = 9) 

V(|)o^(w;x) = {x} (x'^w<e) 

V*off(w;x) = {x} (x"^w > 0) 

V<|»opp(w;x) = co{0.x} (x'^w =6) 

V<t»oFF(w;x) = {0} (x'^w<e) (31) 

CO indicates a convex hull defined as follows: 

co{0.x} = {XxlO < X ^ 1 } (32) 

From the equations (31), (32), 

3<1>on(w;Xj) = { Vi) {< ^ "on) 

a(|>OFF(w;Xi) = {^-jXj}. (i G Iqpp) (33) 



where 

with respect to i G Iqn. 
45 = 0 (x^w > G) 



Xi = [0,1] (x^w=e) 



with respect to i G Ioff» 



X. = 1 (x^w < 0) 



\ = 1 (x'^w > 9) 
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Xj = [o.i] (x^w=e) 

= 0 (x'^'w < 9) (34) 



From the equations (30), (33). (34), 



(35) 



i€Iov lexo 



From the equations (29) and (35), there is \ which satisfies the equations (34) and the following equation: 

X '^i^i ■ - 0 .'.(36) 

20 Furtherrnore, 0 < Xj < 1 is satisfied with respect to all i's. 

[0125] If the original problem (4) is linearly separable, then 



3<I»on(w;Xj) = {0}, i e 



3*OFF(w;Xi) = {0}. i e Iqpp 



in the optimum solution w° to the problem (20) . 
30 [0126] If the original problem (4) is linearly inseparable, then since an inadmissible partem exists, the hypogradtent 
set d^^^'ox ' d<|»oFF nonzero elements as can be understood from the equations (34). Therefore, there exists 
nonzero X| which satisfies the equation (36). That is, . 

35 . (37) 

[0127] In order to Introduce the relationship between X which satisfies the optimization condition formula and the 
• dual problem (17), the following theorem is obtained: 

40 [Theorem 3]: 

[0128] If there is no admissible solution w to the original problem (4): * • 

4S x^w - e > 0, i G Iq^ 

V 

Xj w - e < 0, i e Iqpp 

so and if the following equation is satisfied: 



55 

then the following relationship is satisfied: 
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(Proof) 

[0129] The foregoing is apparent from the following Auxiliary Theorenn 2: 

If there is no solution x for Ax >b with respect to a given matrix A G R"^*", then the solution y G Rp for 

10 

fijy = 0, y > 0 

satisfies 

75 

b\ > 0 

(Proof) 

20 

[0130] When there is a solution y^ to 

A^y = 0, y > 0 

25 

b^yO < 0 which is a negation of bJy^ > 0 is assumed, introducing a contradiction. 

[0131] Since the fact that there is no solution x for Ax >b is equivalent to the fact that there is a solution y for 

30 A^y =0, y > 0. b\ ^ 0, 

there is no y^ for 

as = 0, y° > 0, and b^y° < 0 

[0132] If there is no admissible solution to the original problem (4), I.e*, if the learning patterns Xqimi Xqff linearly 
unseparable, then, from the equations (19), (34), (36), (37) and Theorem 3. 

40 

rc A 

where r is a set of X which satisfies the optimization condition equation (36) for the optimization problem (20) and the 
expression (37) and A is a solution set for the dual problem (17). Since the solution to the dual problem (17) satisfies 
45 the optimization condition equation (36) and the expression (37), the following relationship is satisfied: 

r=> A 

50 Therefore, 

r= A 

55 Consequently, in the case where the original problem is linearly unseparable, then A. that is determined from the optimum 
solution to and the optimum condition for the optimization problem (20) is equal to the solution to the dual problem. 
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(3)-4 Extraction of a linearly unseparable core pattern: 

[0133] It can be seen from the above theory that If the original problem is linearly Inseparable, then X that satisfies 
the equations (31) with respect to the solution w° to the optimization problem (20) is determined, and the positive 
5 component of ^corresponds to a linearly inseparable core pattern that corresponds to vfi. However, the problem, (20) 
may not necessarily have an undifferentlable point, and an optimum solution may not always be determined for the 
problem (20). It will be shown below that LUCPS (linearly inseparable core pattern set) can be extracted even when 
no optimum solution Is determined. 

[0134] An algorithm for solving the optimization problem (20) will be described below. 
10 Aigorithm p (algorithm for solving the problem (20)): 

[Step 1] 

[0135] A certain initial point w^^ is appropriately selected, and an iteration number d Is set to d = 1 . 

IS 

[Step 2] - ' 

[01 36] hoN (x'*'w<*-e); X gXqn and hopp (x"rw<*-G) ; x G Xqff calculated. If there exists even one pattern Xp which 
satisfies: 



20 



or 

2S 



so 



^OFF (Xp^W**-^) ;6 1. XpG Xqpp 

then control proceeds to a next step. If not, is regarded as the solution to the problem (20), and the algorithm 
30 is finished. 

[Step 3] 

[0137] With respect to p in the step 2,. the following correction is made: 



' 35 



where 

40 



w**"^ = w^ + a'*Aw**(Xp) (38) 



= (39) 



45 2^ ^P- 

i = i 



Aw'*(Xp) = V^(|)o^(w;Xp),.Xp G Xq^ 



Aw'*{Xp) = V^<l>Qpp{w;x ),.x G Xqpp 



where V^„<t»oN(w;Xp) and Vy/|)oFF(w;Xp) are expressed as follows: 

55 

Yw*oN (w:Xp) = Xp. 
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^w*OFF (w;Xp) = X 



(40) 



[Step 4] 



[01 38] The algorithm is finished if either one of the following finishing conditions a, d, e is satisfied. If not, then w*='<-Wo 
when the iteration nunriber d = do where do is a certain number When d > do, the algorithm is finished if the finishing 

condition b is satisfied. 

Otherwise, the algorithm is finished if the finishing condition c is satisfied. When the iteration number is not in any of 
these cases, the iteration number d is updated by d <- d + 1 , and control goes back to the step 2. The finishing conditions 
a through e are defined as follows: 

(Finishing condition a) A sufficiently large positive integer jo is determined, and the algorithm is finished if there is j 
which satisfies: 



IS 



20 



25 



30 



3S 



40 



4S 



so 



|w^ - w^^ 



1<c 



with respect to j = 1 , jo where ^ is a sufficiently small positive number. 
(Finishing condition b) The algorithm is finished if = Wo when d > do- 

(Finishing condition c) The algorithm is finished when d > do- do is a positive integer which is somewhat larger than 
the maximum iteration number at the time an n-dimenslonal linearly separable problem is solved. 
(Finishing condition d) The algorithm is finished if a hypogradient set dct)(w«*) of a function 0 with respect to w«* contains 
a zero vector. 

(Finishing condition e) The algorithm is finished if (|)(w*=*) < (|)(w«*-i). 

[01 39] The finishing conditions are most difficult to determine in the algorithms used in the present invention. Whether 
an actual pattern is linearly separable or not is determined based on the finishing conditions. The finishing conditions 
a, b serve to check if a change Aw^ in w in one cycle or more becomes 0 or not when the value d becomes larger than 
a certain value do, i.e., when the convergence has sufficiently been in process. The finishing condition d is used to halt 
the algorithm when an actual. pattern cannot be determined as to whether it is linearly inseparable or not, based on 
the finishing conditions a, b. The learning time in the present invention is govemed by whether the finishing conditions 
are good or bad. In a numerical example described later on, do= 3. 

[0140] The method of determining the step width according to the equation (39) is based on the idea that a minimum 
width required is given so as to make an inadmissible pattern Xp at a present point w<* admissible with respect to the 
original problem. 

[0141] The transformation functions hoN, hopp are essentially undifferentlable at 0, and a hypogradient must be given 
thereto as indicated by the equations (31 ). However, it is generally difficult to positively determine a hypogradient set. 
To avoid the difficulty, a hypogradient is given according to the equations (40) in view of the nature of the neuron 
transformation function. 

[0142] If original input/output patterns are linearly separable, I.e., if the original problem (4) has an admissible solution, 
there Is determined a solution w^ for achieving ait Input/output patterns In the same manner as the learning process 
for perceptrons, according to the algorithm p. If original Input/output patterns are linearly inseparable, a X set obtained 
from the optimization condition for the problem (20) is equal to a solution set for a dual problem. The path of 
converges up to near an optimum solution that satisfies the optimization condition equation (36) for the problem (2) 
and the expression (37), but does not stop at the optimum solution, and vibrates In or around the optimum solution. 
Even if an optimum solution is obtained, it is difficult to positively determine a hypogradient set of an object funcfion of 
the problem (20). Therefore, it is also difficult to determine whether w<* satisfies the optimization condition. 
[0143] The algorithms according to the present invention does not need an optimum solution, but may determine 
LUCPS (linearly inseparable core pattern set). In most cases, even if all LUCPSs are not obtained, a subset thereof 
may be obtained and an OFF pattern to be transformed may be selected from the subset. 

[0144] According to the algorithm p, w^ vibrates in the vicinity of an optimum solution with respect to a sufficiently 
large d (d >.do). \n the vicinity of an optimum solution, an inadmissible component of the original problem (4) is con- 
sidered to correspond to a linearly inseparable core pattern. 

[0145] According to the method (32) of determining a corrected step width with respect to an inadmissible pattern, 
the step width with respect to the present inadmissible pattern is determined such that an output for at least the present 
pattern is equalized to a target output. 

[0146] Then, interference occurs between linearly Inseparable core patterns, and vibrates between regions of w 
which satisfy a target output between these patterns. Thus, if w«» starts to vibrate in the vicinity of an optimum solution 
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after it has converged to a certain extent, then components or patterns in the equation (4) which are made inadmissible 
subsequently are considered to be an element of LUCPS. The LUCPS should be obtained by registering these patterns. 
[0147]. A suitable OFF pattern is selected from the obtained pattern set, and changed to an ON pattern, thus reducing 
the number of elements of the LUCPS. 

[0148] If the conditional expression (1 3) of the algorithm for the linearly separable allocation method described above 
under (2) is not satisfied, then any reduction In the number of elements of X'^ is not assured. Selection of an OFF 
pattern to be transformed in order to satisfy the above condition is made according to the following Rule A or Rule B: 

[Rule A] 

[0149] An OFF pattern which is most distant from the origin shall be selected. 
[Rule 8] 

[0150] An OFF pattern whose Inner product with Is largest shall be selected. 

[0151] Linearly separable patterns are obtained by transfomning patterns which cause linear inseparability, one by 
one. Separating hyperplanes for the linearly separable allocation algorithm are determined from the obtained linearly 
separable patterns. 

[0152] The above process may be Incorporated in the linearly separable allocation algorithm a. 
[0153] FIGS. 8 and 9 are flowcharts of a leaming method according to the present invention. 

[0154] As shown in FIG. 8, first, input patterns (vectors) Xp and output patterns (scalars) tp (p is the number of m 
patterns) are given in a step 1 . Then, a target output pattern t is set to an initial value V in a step.2. and an Iteration 
number k is set to 1 in a step 3. A target output pattern t*^ with respect to a kth hidden neuron Is substituted: in a linearly 
separable output pattern t^^i^ In a step 4, and a transformation routine Is carried out in a step 5. The.transfomiation 
routine will be described later with reference to FIG. 9, 

[0155] Thereafter, a differential pattern which is produced by subtracting from t^^j^i is given as a {k+1)th output 
pattern t'^*^ in a step 6, and the steps 4 through 6 are repeated until t*«+i = 6. Specificaliyi* after the step 6, a step 7 
detennlnes vyhether t*<+i = 0. It t*<+i 0. then the iteration number k is incremented by 1 in.a step 8. and control returns 

to the step 4. . 

[0156] When tkti=0, the neutral network shown in FIG. 2 Is constructed with the weights (association coefficients) 
determined as described above between output and hidden layers, in a step 9. 
[0157] The ti^ansformation routine will be described below. 

[0158] As shown in FIG. 9. first a weight w^ is set to 0 in a step 11. Then, a pattern number p and the iteration number 
d of the algorithm p are set to 1 , and a variable sw indicative of a wrong answer (described later) is set to 0 In a a step 
12. A step 13 determines whether the linearly separable target output t^^ agrees with an actual output h.(XpTw<*-e) 
with respect to a pattern p. If not. then the weight is altered or corrected according to the equations (38) and (39), . 
and the variable sw is set to 1 in a step 14. Thereafter, a step 1 5 determines whether w^ Is vibrating (d > dO) oi* not. If 
vibrating, the pattern p Is registered in a step 16. 

[0159] The steps 1 3 through 1 6 are executed with respect to m patterns. Specifically after the pattern p is registered, 
or if the linearly separable target output t^,^ agrees with an actual output h (Xp^w^^-e) , a step 17 determines vyhether 
p < m or not. If p < m, then p and d are Incremented by 1 In a step 18, and control goes back to the step 1 3. 
[0160] When p exceeds m, a step 19 determines whether either one of the finishing conditions a ~ e is satisfied or 
not. If satisfied, then the pattern p Is registered as a linearly unseparable pattern In a step 20. One OFF pattern is 
selected from the registered patterns as a pattern p* in a step 21 . A target output t'^^p. for the selected pattern p' is set 
to 1 in a step 22, from which the processing goes back to the step 12. 

[0161] If none of the finishing conditions are satisfied in the step 19, then a step 23 determines whether the variable 
sw = 0 or not. If not, control returns to the step 1 2, and if yes, the pattern Is registered as linearly separable in a step 24. 

(4) Algorithm for the linearly sejDarable allocation method (LISA): 

(4)-1 Algorithm: 

[0162] The linearly separable allocation algorithm a described above under (2) and the algorithm p for solving the 
optimization problem described under (3)-4 are joined to each other, making algorithms y. S*5 as described below. 
Algorithm Y- 
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[Stepi] 

[0163] X^oN = X^OFF = ^OFF- The iteration number k Is set to k = 1 . 
s [Step 2] 
[0164] 

Xk 
.V, . - ^ ON- 

[Step 3] 

[01 65] Using the algorithm 6^ (described later on), it is checked whether X^q,^, Xk - Y^q^ are linearly separable or not. 
[Step 4) 

[0166] If linearly separable, then obtained by the algorithm ^ is set to w^, and the processing goes to a step 5. 
If linearly inseparable, then an element whose norm is maximum is selected from an intersection of sets {XjIXj G 
Ilu}» {X*^ - X*^on} obtained according to the algorithm S^. After Xk u» {x^}, control goes back to the step 4. 

[Step 5] 

[0167] If Xk= X^oN, then control goes to a step 6. If not, then 

2S 

y^"*"! yk , yk 

^ ON ^ "^"^ ON' 

30 ^ OFF ^ ON 

and the iteration number k is updated into k + 1. The processing then goes to the step 2. 
[Step 6] 

35 

[0168] The three-layer neural network model expressed by the equations (9), (10) is constructed, and the algorithm 
Y is ended. 
Algorithm 6^: 

40 [Stepi] • 

[0169] A certain initial point is selected. Y^qn = X^qn. Y^qff = * ^^^on- linearly unseparable core pattern 

set = 0 (empty set) . The Iteration number d Is set to d = 1 . 

4S [Step 2] 

[0170] hoN (xTw<*-e) [X G Y^on] and hoFF (xTwd-e)[x G Y^opp] are calculated. If even one pattern p exists which 
satisfies: 



or 

55 



hoFF(Xp V-0) ^ 0 (Xp G Y*^opp). 



then control goes to a next step. If not, vi^ is set to w^, and the algorithm is ended. 
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[Step 3] 

[0171] All patterns p in the step 2 are corrected according to the equations (38), (39). ltd > c^, then the patterns p 
are added to the linearly inseparable- core pattern set t|^y. That is, 

'lU ^ *LU ^ {P} 

dQ is a maximum value^for the convergence iteration number at the time there is a solution to a predetermined original 
problem. 

[Step 4] 

[0172] If any of the finishing conditions a, d, e is satisfied, then the algorithm is finished. If not satisfied, when d = 
do, vi^<—Wq. \i Xhe finishing condition b is satisfied, then the algorithm is finished. If not, then the algorithm is finished 
when the finishing condition c is satisfied. If the finishing condition c is not satisfied, then d ^ d + 1 . and the processing 
goes back to the step 2. 

(4)-2 Speeding-up of algoirithm: 

[0173] Addition of improvements, described below, to the algorithms makes it possible to speed up the teaming 
process. - 

1 . The weights are calculated as integer-type weights. It is important that the threshold be not too small compared 
to the number of input neurons. If the threshold were too small, the iteration number would be unnecessarily be 
increased due to a quantization error, possibly causing infinite recurrence. 

2. In the step 3 of the algorithm y, before executing the algorithm 5^ j which satisfies 

with respect to i e Iqn. j ^ >off changes f rom OFF patterns to ON patterns. 

[0174] The significance of the speeding-up process under 2 above is as follows: 

This speeding-up process is effective to reduce the number of times that the algorithm is carried out to discover 

a linearly unseparable core pattern set li_u. However, since other patterns than Ilu may be rendered ON. the algorithm* 
may possibly be subject to recurrence. No recurrence took place with a four-input pattern in an experiment described 
later on. 

[0175] FIGS, 10(a) through 10(c)'Show a transformation for the speeding-up process. White dots indicate OFF pat- 
terns, and black dots ON patterns. As shown in FIG. 10(a), those OFF patterns (on the righthand or upper side of 
dotted lines) which are equal to or larger than the ON patterns are transformed into ON patterns. Thereafter, as shown 
in FIG. 10(b), the ON and OFF patterns are clearly separated from each other. Hovyever, as shown in FIG. 10(c), there 
is a transformation in which the number of changed patterns 13 smaller 

[0176] When there are an OFF pattern set and a ON pattern as shown in FIG. 10(a), since a line interconnecting the 
origin and the ON pattern passes through the convex hull of the OFF pattern set, these patterns are linearly unseparable. 
According to the above speeding-up process, all OFF patterns on the lefthand or upper side of the dotted lines are 
transformed into ON patterns, as shown in FIG. 10(b). and become linearly separable. This transformation is not a 
minimum transformation for making linearly separable patterns. The transformation may be less if effected as shown 
in FIG. 10(c). 

[0177] FIGS. 11 and 12 show the processes according to the algorithms y, 6^ in greater detail. FIG, 1 3 illustrates the 
definition of letters and symbols in FIGS. 11 aiid 12. FIGS. 14 through 16 are a detailed flowchart of the algorithm y 
including the speeding-up process shown in FIGS. 10(a) through 10(c). FIG. 17 is a detailed flowchart of the algorithm 
5K FIGS. 14.through 17 are a specific representation of the processes shown in FIGS. 8 and 9. The steps shown in 
FIGS. 14 through 17 are basically the same as those shown in FIGS. 8 and 9. and will not be described in detail. 

(4)-3 Expansion of the algorithms: 

[0178] According to the above algorithms, the neuron threshold 6 is of a fixed positive value. Therefore, h(w'^x-e) 
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necessarily becomes 0 at the origin 0. when all inputs are 0, the output has a value of 0. To achieve all patterns including 
those in which all inputs are 0, a bias neutron 4 may be added as shown In FIG. 18. The bias neuron 4 is always ON 
so that the output is ON even when all the neurons of the Input layer 1 are OFF. The neural network arrangement 
shown in FIG. 18 makes a learning process possible even if all inputs have a value of 0. 

[0179] In actual biological environments, such a neuron is of a circuit arrangement as shown in FIG. 19 which has 
a synaptic coupling for self feedback. If such a neuron is used, ail Input/output patterns including those in which all 
Inputs are 0 can be leamed according to the above algorithms. 

[0180] FIG. 19 shows a bias neuron unit with a self feedback loop. Once an ON input signal is applied to the bias 
neuron unit, the bias neuron unit keeps an ON value at all times. The value of the bias neuron unit may be made OFF 
if an inhibitory neuron is connected to the input thereof. 

[0181] The neural network for executing the above algorithms basically has multiple inputs and a single output The 
neurai network may be expanded Into a multiple-input multiple-output neural network as shown in FIG. 20. Since 
synaptic couplings between output and hidden layers 3, 2 do not interconnect all the neurons In these layers, the 
learning process can be carried out at high speed. 

[0182] The present invention is not limited to the above three-layer network structures, but may be applied to a 
multilayer neural network structure having a plurality of hidden layers 2 as shown in FIG. 21 . 

[0183] The neural network shown in FIG. 21 comprises an input layer 1, an output layer 3. and three hidden layers 

2 . The hidden neurons having state values x"»t, • x^\; x^2' • ' ^^2> x^p. x^Pp in the first hidden layer from the 
input layer 1, and hidden neurons having state values 2^^, 2^^ in the third hidden layer adjacent to the output layer 

3 are generated as required as the learning process progresses. The hidden neurons having state values x^', — , Xp 
in the second hidden layer are provided in advance as output neurons (corresponding to the output neuron having a 
state value y in FIG, 3) with respect to the hidden neurons in the first hidden layer. 



<Experiments> 
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[0184] To ascertain the effectiveness of LISA, it was checked as to whether it can leam all patterns including linearly 
inseparable patterns. A numerical experiment was also been conducted to see if LISA causes an NP (nondeterministic 
polynomial) problem or not. The results of these experiments were compared. with those of leaming processes accord- 
ing to back propagation. The experiments employed "EWS,SUN4/260' manufactured by Sun Micro Systems. 

30 

1 . Inspection of the leaming capability of a neural network: 

[0185] Generally, it Is impossible to determine whether a neural network can leam all patterns actually. A four-input, 
one-output neural network was experimented to see if it can leam all input/output pattems. Since the LISA has only a 
55 positive threshold, the output would be 0 when all inputs are 0 unless the neural network is expanded into a multiple- 
input multiple-output neural network as shown in FIG. 20. The leaming pattern included two outputs (0, 1) with respect 
to 15 input pattems except for an input pattern in which all four inputs are 0. The number of all input/output pattems 
was 2^5 = 32768. 

[0186] A back-propagation neural network used for comparison had two layers except for an input layer, and included 
40 4 neurons in the input layer, 8 neurons in the hidden layer, and one neuron in the output layer. Learning parameters 
were a leaming step width t| = 0.9 and a bias term coefficient a = 0.9. The leaming process according to back propa- 
gation compared a target output and an output pattern of the network in each iteration, and ended when all of the 
compared outputs were the same. The output of the network was converted into a binary signal using a threshold of 
0.5 for comparison with the target output. Therefore, if the output was 0.6, a binary value of 1 was compared with the 
4S target output, and if the output was 0.3, a binary value of 0 was compared with the target output. 

[0187] When the iteration number reached 3,000, the leaming of the patterns was interrupted as no solution that 
would satisfy all the patterns was considered to exist. The results of the experiments are given in Table 1 below. 



Table 1 



Leaming of 32,768 patterns 


Algorithm 


Back propagation 


USA 


Calculation time 


162,530.4 sec. 


81.9 sec. 


Wrong answers 


4085 


0 


Percentage of correct answers 


88% 


100% 
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[0188] ' As can be seen from Table 1 . the percentage of correct answers according to the back propagation process 
was 88 %, whereas the percentage of correct answers according to LISA was 100 %. The time required for learning 
the patterns according to the back propagation process was about 1 ,984 times the time required for learning the patterns 
according to LISA. When the iteration number reached 3,000 according to the back propagation process, the learning 
process was interrupted as producing wrong answers. The calculation times according to LISA and the back propa- 
gation process while no wrong answers were produced are given in Table 2 below The results given in Table 2 were 
obtained when 1 00 patterns were checked as they were solved by the back propagation process, i.e., leamed in learning 
cycles less than 3,000 learning cycjes. 



Table 2 



Learning of 100 pattems 


Algorithm 


Back propagation 


LISA 


Calculation time 


102.8 sec. 


0.2 sec. 


Wrong answers 


0 


0 


Percentage of correct answers 


100% 


100% 



[0189] Table 2 indicates that subject to no wrong answers, the LISA is 514 times faster than the back propagation 
process. Therefore, even if the back propagation process can learn patterns with the iteration number being less than 
3000, the LISA is about 500 times faster than the back propag|ation process. However, as described above with respect 
to the speeding-up of the algorithms, the learning speed Is lower for integer calculations if the threshold Is too small. 
The threshold should preferably be of. a value which is as large as possible depending on the memory capacity of the 
computer used and the dimensions of problems. For four-dimensional problems, the maximum number of neurons 
generated in the hidden layers of the LISA was 5 during a trial performance on 32, 768 pattems. 

2. Experiment to see If the LISA causes an NP problem: 

[0190] Even though the LISA is faster than the back propagation process, if it causes an NP problem when the 
number of input pattems increases, then the calculation time exponentially increases, and will not be practical. An 
experiment was conducted to see how the calculation time of the LISA varies with respect to the number of pattems 
to be learned. 

[0191] The pattems to be learned were 7-input, 1 -output patterns, and N pattems determined according to uniform 
random numbers were learned 100 pattems each time. The results are shown in the logarithmic graph of FIG. 22. The 
graph was approximated with a polynomial (aN** + c) with respect to each of pattern numbers O] 25, 75, resulting In 
illustrated curves. The polynomial was represented by 1.5 X lO^N^-^s + 0.05. As the pattern number approaches 90, 
the gradient of the equation was larger than the experimental values. For7 inputs, the calculation time of the LISA was 
of a polynomial order, and does not cause an NP problem. For seven-dimensional 127 pattems, the maximum number 
of neurons generated in the hidden layers of the LISA was 22 during a trial performance on 100 random patterns. 
[0192] FIG. 22 also shows the calculation time of the back propagation process. In this experiment, the back prop- 
agation process was modified as follows: 

When the iteration number exceeded 300, it was considered that the learning process would not converge any 
more, and the association weights were initialized again according to uniform random numbers, and the learning proc- 
ess was started again. This procedure was repeated up to 5 times. It can be seen from this graph that when the pattern 
number Is 40 or higher, the LISA is about 100 times faster than the back propagation process. 

[0193] In FIG. 22, the LISA (represented by white dots) is about 100 times faster than the back propagation process 
(black dots). Since the graph has a vertical axis representing the logarithm of the time, the amount of calculation of 
either algorithm did not exponentially Increase. 

[0194] If the calculation time increased exponentially, then any plotted calculation time would be indicated by a 
straight line having a certain gradient.The results of the LISA and the back propagation process show that the gradients 
thereof were reduced as the pattern number increased. The percentage of correct answers of the back propagation 
process suddenly started to decrease when the pattern number exceeded about 40 as shown in FIG. 23. Since the 
curve was similar to that of a sigmoid function, it was approximated by a sigmoid function as shown. The. percentage 
of the LISA was kept at 100 % irrespective of the number of patterns. It indicates that the LISA could learn all input/ 
output patterns generated by uniform random numbers. 

[0195] As shown in FIG. 23, the percentage of correct answers of the LISA was kept at 100 % regardless of the 
pattern number Increasing. The percentage of correct answers of the back propagation decreased at the rate of a 
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sigmoid function as the pattern number increased. 

[0196] The results of the experiments indicate the following: 

1 . The LISA does not cause an NP problem with respect to the pattern number. 

2. For seven inputs, the LISA is about 100 times faster than the back propagation process. 

3. The number of patterns that can be learned (the percentage of correct answers) by the back propagation process 
decreases at the rate of a sigmoid function as the pattern number increases. 

[0197] When the number of patterns is 50 or more, the percentage of correct answers of the back propagation greatly 
decreases, and the actual learning speed of the LISA appears to be much more faster than the back propagation 
process. 

[0198] As described above, the learning algorithm for a binary neural network which can learn even linearly unsep- 
arable patterns according to the present invention makes it possible for the neural network to learn all four-input one- 
output patterns except for a pattern with all inputs being zero. The. learning algorithm of the invention is capable of 
learning patterns much faster than the conventional algorithm, and requires a calculation time on a polynomial order 
[0199] With the present invention, as described above, a linearly unseparable input/output pattern is transformed 
Into several linearly separable patterns, which are then combined by an output layer so that they produce the same 
output as the original input/output pattern. Therefore, a neural network can learn all input/output patterns irrespective 
of whether they are linearly separable or unseparable. 

[0200] Since hiddea layer neurons are generated as required, if a pattern Is linearly separable, one neuron is suffi- 
cient, and if a pattern is linearly unseparable, a minimum number required of neurons or a close number of neurons 
are sufficient. Thus, a memory required to achieve desired input/output pattems may be of a minimum capacity required. 
Since the number of association weights between neurons to be niodified is also minimum, the leaming speed is very 
high. 

[0201] Since the conventional back propagation process uses a sigmoid function as a neuron transformation function, 
the weights vary to a large degree as a local optimum point is approached, but to a very small degree at a point far 
from the local optimum point. The transformation function used in the present invention has a constant gradient any- 
where at points that are inadmissible for the leaming process. Because the step width is of a minimum making the 
present patterns admissible, the association coefficients converge at high speed irrespective of the distance from an 
optimum point. 

[0202] According to the present invention, since hidden layer neurons are automatically generated as required, there 
is no concern required over how many hidden layer neurons are to be employed. 

[0203] The analysis of a neural network that is produced according to a learning process is highly complex and 
difficult to achieve. According to the present invention, the network as shown In FIG. 3 is constructed according to the 
linearly separable allocation method. Since the network shown in FIG. 3 achieves calculations between sets as indi- 
cated by the equations described with reference to FIG. 2(b), it can easily be analyzed. 

[0204] Although certain preferred embodiments of the present invention have been shown and described in detail, 
It should be understood that various changes and modifications may be made therein without departing from the scope 
of the appended claims. 

Claims 

1 . A neural network comprising: 

an input layer having a plurality of Input neurons for receiving an input signal said input signal containing an 
input/output pattern set (P), said plurality of input neurons comprising a predetermined fixed number of input 
neurons; 

one or more hidden layers having one or more hidden neurons for processing a signal received from, said 
plurality of input neurons, and 

an output layer having one or more output neurons for processing a signal received from said hidden neurons 
and for producing an output signal, 

said input layer and said hidden layers being coupled to each other by association coefficients (wji) determined 
by a learning process, 

said hidden layers and said output layer being coupled to each other by association coefficients (wji) deter- 
mined such that a sum of association coefficients between said output layer neurons and first through odd- 
numbered sequential hidden neurons in a single hidden layer is larger than a predetermined threshold and a 
sum of association coefficients between said output neurons and first through even-numbered sequential hid- 
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den neurons in the single hidden layer is smaller than said predetermined threshold, 

the neural network automatically generating the hidden neurons of said single hidden layer as required ac- 
. cording'to a predetermined process as said learning process progresses, 

wherein said given input/output pattern set is implemented by a combination of linearly separable pattern sets 
realized by said hidden neurons, each hidden neuron corresponding to a respective on6 of said linearly sep- 
arable patterns said neural network being characterized In that said learning process is effected by modifying 
association coefficients w^, between said hidden layers and said input layer according to the formula 

w = w + a*x 

where 

d is an index Indicative of a number of repeated calculations, and 
a, a bias coefficient, is indicated by . . 

w«*, w^i being weights, w e R"x being the input pattern or set of input vectors 

T indicating transposition, and 

e being a neuron threshold value, and e e Ri 

if target and actual outputs are different from each other, 
. extracting a set of patterns from all given patterns according to a predetermined.pattern set extracting process, 
and 

transforming one by one patterns causing inseparability selected from said extracted set of pattems according 
to a predetermined rule into pattems of different type, the learning process being effectuated again until a 
certain finishing condition is satisfied, thereby finally obtaining a linearly separable pattem set; 
determining said association coefficients between said output layer and said generated hidden neurons so as. 
to realize said giv^n input/output pattem set by means of a combination of linearly separable pattem sets 
realized by said hidden neurons. 

The neural netvyork according to claim 1 , wherein said one or more hidden layers comprise hidden neurons, wherein 
the hidden neurons in an odd-numbered hidden layer from said input layer are generated as required as said 
learning process pi^ogresses, and the hidden neurons In an even-numbered hidden layer from said input layer are 
provided in advance as output neurons with respect to the hidden neurons |n said odd-numbered hidden layer. 

The neural network according to claim 1 , further comprising a bias neuron connected to each of said hidden neu- 
rons, said bias neuron continuously being in an ON state when an input signal for making said bias neuron ON is 

applied. 

The neural network according to claim 1, wherein said hidden neurons are divided into a plurality of groups, re- 
spective ones of said output neurons being provided as coupled to respective ones of the groups of the hidden 
neurons resulting in'a multiple-input; multiple-output network. 

A leaming method for a neural network comprising an input layer having a plurality of input neurons for receiving 
an input signal said input signal containing an input/output pattern set (P). one or more hidden layers having one 
or more hidden neurons for processing a signal received from said plurality of input neurons, and an output layer 
having one or more output neurons for processing a signal received from said hidden neurons and for producing 
an output signal, the leaming method comprising the steps of: 

determining whether said given input/output pattern is linearly separable or not; 

applying input values from said input/output pattern to said input layer and a corresponding output values from 
" said input/output pattem to said hidden neurons to effect a predetermined learning process on said hidden 
neurons, if said given input/output pattern set is linearly separable; 

determining association coefficients between said output layer and said hidden layers such that a signal from 
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said hidden neurons and an output signal fronn said output neurons which receive said signal from said hidden 
neurons are equal to each other; 

allocating a learning pattern set determined by a predetermined learning pattern set determining process, 
between sard hidden layers and said input layer to effect said learning process on said hidden neurons, if said 
given input/output pattern set is linearly inseparable; 

generating the hidden neurons of at least one of said hidden layers depending on said given input/output 
pattern set according to a predetermined process as said learning process progresses, each hidden neuron 
corresponding to a respective one of said linearly separable pattern sets; 

said learning method being characterised In that said learning process is effected by modifying association 
coefficients between said hidden layers and said input layer according to the formula 

w = w + a*x 

where 

d is an index Indicative of a number of repeated calculations, and 
a, a bias coefficient, Is indicated by 

1 

w<^, w«*+i being weights, w G R" 

X being the input pattern or set of input vectors 

T indicating transposition, and 

0 being a neuron threshold value, and 0 G R^, 

if target and actual outputs are different frorrreach other, 

extracting a set of patterns from, all given patterns according to a predetermined pattern set extracting process, 
and 

transforming one by one patterns causing inseparability selected from said extracted set of patterns according 
to a predetermined rule into patterns of different type, the learning process being effectuated again until a 
certain finishing condition is satisfied, thereby finally obtaining a linearly separable pattern set; 
determining said association coefficients between said output layer and said generated hidden neurons so as 
to realize said given input/output pattern set by means of a combinatidn of linearly separable pattern sets 
realized by said hidden neurons. 

6. The learning method according to claim 5, wherein said predetermined rule is defined to select a pattern remotest 
from the origin of a coordinate space in which the pattern is presented. 

7. The learning method according to claim 5, wherein said predetermined rule is defined to select a pattern whose 
inner product with the weight of a predetermined hidden neuron is maximum or minimum. 

8. The learning method according to claim 5, further comprising the steps of: 

applying an input pattern to said hidden neurons to determine whether an output of said output neurons agrees 
with the target output or not; and 

correcting the association coefficients between said hidden layers and said input layer until a predetermined 
finishing condition is satisfied, if the output of said output neurons disagrees with the target output. 

9. The learning method according to claim 8, wherein said finishing condition is satisfied when the number of times 
that said association coefficients are corrected exceeds a predetermined value. 

10. The learning method according to claim 8. wherein said finishing condition is based on a comparison between a 
present weight and a weight which satisfies a predetermined condition at at least one time in the past, when 
repetitive calculations are effected in said learning process. 
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11. The learning method according to claim 8. wherein said association coefficients are corrected by a step width 
which is determined according to a predetermined correction rule each time said association coefficients are to 
be corrected. 

12. The learning method according to claim 5, wherein said learning process is effected by employing different trans- 
formation functions hoN hopp with respect to a given ON pattern whose target output is 1 and a given OFF pattern 
whose target output Is 0, maximizing a sum of the transformation functions with respect to all patterns, and trans- 
forming said input/output pattem into a linearly separable pattern based on an optimization condition equation for 
maximizing the sum of the transformation functions, it said input/output pattern is linearly unseparable. 

13. The learning method according to claim 5, further comprising the steps of: 

determining a learning pattern according to a predetermined process from an original input/output pattern and 
an input/output pattem realized by hidden neurons which have been learned; and 
allocating the teaming pattem to hidden neurons that have not been learned. . 

14. The learning method according to claim 5, wherein said pattem is transformed. by transforming an OFF pattem 
whose target output is 0 into an ON pattern whose target output is 1, and modifying the OFF pattem into an ON 
pattern when all elements of the OFF pattem have a value equal to or greater than the value of corresponding 

. elements of an ON pattern. 

15. The learning method according to claim 5, wherein said set of pattems is extracted by checking a change in the 
association coefficients between said hidden layers and said input layer, and extracting a set of pattems in which 
the association coefficients do not change and the target output disagrees with the actual output. 

16. The learning method according to claim 8, wherein said finishing condition is based on either a value of a prede- 
termined object function having an argument comprising said association coefficients, or a hypogradient of said 
object function. , 

17. The leaming method according to claim 5, wherein said association coefficients between said output layer and 
said generated hidden neurons are deterrnined so as to have altemate sign in order of the generation of said 
generated hidden neurons. 

Patentanspruche 

1. Neuronales Netzwerk, umfassend: 

eine Eingabeschicht mit einer Mehrzahl von Eingabeneuronen zum Empfang eines Eingabesignals. wobei 
das Eingabesignal eine Eingabe/Ausgabemustermenge (P) enthalt, wobei die Mehrzahl von Eingabeneuronen 

eine vorbestimmte feste Anzahl von Eingabeneuronen umfa3t; 

eine oder mehrere versteckte Schichten mit einem Oder mehreren versteckten Neuronen zum Verarbeiten 
eines von der l\/1ehrzahl von Eingabeneuronen empfangenen Signals, und 

eine Ausgabeschicht mit einem oder mehreren Ausgabeneuronen zurn Verarbeiten eines von den versteckten 
Neuronen empfangenen Signals und zum Erzeugen eines Ausgabesignals, 

wobei die Eingabeschicht und die versteckten Schichten miteinander durch Zuordnungskoeffizienten (wij) ge- 
koppelt sind, die durch einen LernprozeB bestimmt sind, 

wobei die versteckten iSchichten und die Ausgabeschicht, die durch Zuordnungskoeffizienten (wj|) miteinander 
. gekoppelt sind, derart bestimmt sind, daB eine Summe von Zuordnungskoeffizienten zwischen den Ausgabe- 
schicht-Neuronen und ersten bis ungeradzahligen aufeinanderfolgenden versteckten Neuronen in einer ein- 
zelnen versteckten Schicht groBer ist als ein vorbestimmter Schwellenwert und eine Summe von Zuordnungs- 
koeffizienten zwischen den Ausgabeneuronen und ersten bis geradzahligen aufeinanderfolgenden versteck- 
ten Neuronen in der einzelnen versteckten Schicht kleiner ist als der vorbestimmte Schwellenwert. 
wobei das neuronale Netzwerk mit fortschreitendem LernprozeB die versteckten Neuronen der einzelnen ver- 
steckten Schicht nach Bedarf gemaB einem vorbestimmten Froze B automatisch erzeugt. 
wobei die gegebene Eingabe/Ausgabemustermenge durch eine Kombination von linear trennbaren Muster- 
mengen implementiert wird, die durch die versteckten Neuronen realisiert sind, wobei jedes versteckte isieuron 
einem entsprechenden der linear trennbaren Muster entspricht. 
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wobei das neuronals Netzwerk dadurch gekennzeichnet ist, daB der LemprozeB erfolgt durch Modifizieren 
von Zuordnungskoeffizienten (w**) zwischen den versteckten Schichten und der Eingabeschicht gemaB der 
Formel 



d+1 d 

w = w + a * X, 

wobei 

d ein Index ist, der eine Anzahl wiederholter Berechnungen bezeichnet, und 
a ein Vorspann-Koeffizient Ist, definiert durch 



a « (6 w«x) xi 

wobei w^, w*^"" Gewichte sind, w G R" 
X die Eingabemuster oder die Menge von Eingabevektoren ist, 
T die Transposition bezeichnet und 
e ein Neuronenschweltenwert ist. und 6 G Ri, 

wenn sich Soil- und Istausgaben voneinander unterscheiden, 

Extrahieren einer Mustermenge aus alien gegebenen Mustem gemaB einem vorbestimmten Mustermengen- 
ExtraktionsprozeB und 

einzelweises Transformieren von eine Untrennbarkeit bewirkenden Mustem, die aus der extrahierten Muster- 
menge gewahit sind, gemaB einer vorbestimmten Regel in Muster untersch led lichen Typs, wobei der Lem- 
prozeB erneut durchgetuhrt wird, bis eine bestimnrrte Beendigungsbedingung erfulit ist, um schlieBllch eine 
linear trennbare Mustermenge zu erhalten; 

Bestimmen der Zuordnungskoeffizienten zwischen der Ausgabeschicht und djen erzeugten versteckten Neu- 
ronen, um mittels einer Kombination von durch die versteckten Neuronen realisierten linear trennbaren Mu- 
stermengen die gegebene Eingabe/Ausgabemustermenge zu realisieren. 

Neuronales Netzwerk nach Anspruch 1 , wobei di^ eine oder die mehreren versteckten Schichten versteckte Neu- 
ronen aufwelsen, wobei die versteckten Neuronen mit fortschreitendem LernprozeB in einer ungeradzahligen ver- 
steckten Schicht von der Eingabeschicht nach Bedarf erzeugt werden, und wobei die versteckten Neuronen in 
einer geradzahligen versteckten Schicht von der Eingabeschicht vorab als Ausgabeneuronen in bezug auf die 
versteckten Neuronen in der ungeradzahligen versteckten Schicht vorgesehen werden. 

Neuronales Netzwerk nach Anspruch 1 , ferner umfassend ein Vorspann- Neuron, das mit jedem der versteckten 
Neuronen verbunden ist, wobei das Vorspann-Neuron kontinuierlich in einem EIN-Zustand ist, wenn ein Eingabe- 
signal zum EIN-Schalten des Vorspann -Neurons angelegt wird. 

Neuronales Netzwerk nach Anspruch 1 , wobei die versteckten Neuronen in eine Mehrzahl von Gruppen unterteilt 
sind, wobei jeweilige der Ausgabeneuronen mit jeweiligen der Gruppen von versteckten Neuronen gekoppelt sind, 
um ein Mehrfach-Elngabe-, Mehrfach-AusgabeNetzwerk zu erhalten. 

Lernverfahren fur ein neuronales Netzwerk, umfassend eine Eingabeschicht mit einer Mehrzahl von Eingabeneu- 
ronen zum Empfang eines Eingabesignals. wobei das Eingabesignal eine Eingabe/Ausgabemustermenge (P) ent- 
halt, eine oder mehrere versteckte Schichten mit einem oder mehreren versteckten Neuronen zum Verarbeiten 
eines von der Mehrzahl von Eingabeneuronen empfangenen Signals, sowie eine Ausgabeschicht mit einem oder 
mehreren Ausgabeneuronen zum Verarbeiten eines von den versteckten Neuronen empfangenen Signals und 
zum Erzeugen eines Ausgabesignals. wobei das Lernverfahren die Schritte umfaBt: 

Bestimmen, ob das gegebene Eingabe/Ausgabemuster linear trennbar ist oder nicht; 

Aniegen von Eingabewerten von dem Eingabe/Ausgabemuster an die Eingabeschicht und eines entsprechen- 
den Ausgabewerts von dem Eingabe/Ausgabemuster an die versteckten Neuronen. um einen vorbestimmten 
LernprozeB an den versteckten Neuronen zu bewirken, wenn die gegebene Eingabe/Ausgabemustermenge 
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linear trennbar ist; 

Bestimmen von Zuordnungskoeffizlenten zwischen der Ausgabeschicht und den versteckten Schichten derart, 
daB ein Signal von den versteckten Neuronen und ein Ausgabesignal von den Ausgabeneuronen, die das 
Signal von den versteckten Neuronen empfangen, zueinander gleich sind; 

Zuordnen einer Lernmustermenge. die durch einen vorbestlmmten Lernmustermengen-BestinnmungsprozeB 
bestimmt ist, zwischen den versteckten Schichten und der Eingabeschicht, um den LernprozeB der versteck* 
ten Neuronen zu bewirken, wenn die gegebene Eingabe/Ausgabennusternnenge linear untrennbar ist; 
Erzeugen der versteckten Neuronen von zunriindest einer der versteckten Schichten in Abhangigkelt von der 
gegebenen EIngabe/Ausgabemustermenge gemaB einem vorbestlmmten ProzeB mit lortschreltendem Lem- 
prozeB. wobel jedes versteckte Neuron einer entsprechenden der linear trennbaren Mustermengen entspricht, 
wobel das Lernverfahren dadurch gekennzeichnet ist, daB der LernprozeB erfolgt durch Modifizieren von Zu- 
ordnungskoeffizlenten zwischen den versteckten Schichten und der Eingabeschicht gemaB der Forme! 

d+1 d * 

w . = w + a * X, . 

wobel 

d ein Index Ist, der eine Anzahl wiederholter Berechnungen bezeichnet, und 
a ein Vo.rspann-Koeffizlent ist, definlert wird durch 



a « (6 - w«»Tx) xi 

• . 1 - 

wobel w<*, w«**i Gewichte sind, w e R" 
X die Eingabemuster oder die Menge von Eingabevektoren ist,- 
T die Transposition bezeichnet und 
9* ein Neuronenschwellenwert Ist, und 9 e R^, 

wenn Soli- und Istausgaben voneinander unterschiedlich sind, 

Extrahieren einer Mustermenge aus alien gegebenen Mustern gemaB einem vorbestlmmten Mustermen- 
gen-ExtraktlonsprozeB, und 

einzelwelses Transformieren von eine Untrennbarkeit bewirkenden Mustern. die aus der extrahlerten Mu- 
stermenge gewahtt sind, gemaB einer vorbestlmmten Regel in Muster unterschiedtlchen Typs, wobei der 
LernprozeB erneut durchgefuhrt wird, bis eine bestlmmte Beendigungsbedingung erfullt Ist. um schlleBllch 
eine linear trennbare Mustermenge zu erhalten; 

Bestimmen der Zuordnungskoeffizlenten zwischen der Ausgabeschicht und den erzeugten versteckten 
Neuronen, um mittels einer Kombinatlon von durch die versteckten Neuronen realisierten linear trennba- 
ren Mustermengen die gegebene Eingabe/Ausgabemustermenge zu realisieren. 

Lernverfahren nach Anspruch 5, wobei die vorbestimmte Regel definlert 1st, um ein Muster zu wahlen, das am 
weitesten vom Ursprung des Koordinatenraums entfernt ist, In dem sich das Muster beflndet 

Lernverfahren nach Anspruch 5. wobel die vorbestimmte Regel definlert ist, um ein Muster zu wahlen, dessen 
inneres Produkt mit dem Gewicht eines vorbestlmmten versteckten Neurons maximal oder minimal ist. 

Lernverfahren nach Anspruch 5, das ferner die Schritte aufweist: 

Aniegen eInes Eingabemusters an die versteckten Neuronen zur Bestlmmung, ob eine Ausgabe der-Ausga- 
beneuronen mit der Sollausgabe uberelnstlmmt oder nicht; und 

Korrlgieren der Zuordnungskoeffizlenten zwischen den versteckten Schichten und der Eingabeschicht, bis 
. eine vorbestimmte Beendigungsbedingung erfullt Ist, wenn die Ausgabe der-Ausgabeneuronen mit der Sotl- 
ausgabe nicht Obereinstirnmt. 

Lernverfahren nach' Anspruch 8, wobei die Beendigungsbedingung erfullt ist, wenn die Hauflgkeit, mit der die 
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Zuordnungskoeffizienten korrigterl werden, einen vorbestimmten Wert uberschreitet. 

10. Lernverfahren nach Anspruch 8, wobei die Beendigungsbedingung auf einem Vergleich zwischen ernem vorhan- 
denen Gewicht und einem Gewicht beruht, das zumindest einmal in der Vergangenheit einer vorbestimmten Be- 
dingung genugt, wenn In dem LemprozeB repetitive Bereclinungen durchgefuhrt werden. 

11. Lernverfahren nach Anspruch 8, wobei die Zuordnungskoeffizienten um eine Schrittwerte korrigiert werden, die 
gemaB einer vorbestimmten Korrekturregel bestimmt wird. jedesmal, wenn die Zuordnungskoeffizienten zu korri- 
gieren sind. 

12. Lernverfahren nach Anspruch 5, wobei der LernprozeB* erfolgt durch die Venwendung unterschiedlicher Transfor- 
mationsfunktionen hgi^ h^us bezug auf ein gegebenes EIN-Muster, dessen Sollausgabe 1 ist, sowie ein gege- 
benes AUS-Muster, dessen Sollausgabe 0 ist, Maximieren einer Summe der Transformationsfunktionen in bezug 
auf alle Muster und Transform leren der Eingabe/ Ausgabemuster in ein linear trennbares Muster auf der Basis 
einer Optimlerungsbedingungsgleichung zum Maximieren der Summe der Transformationsfunktionen, wenn das 
Eingabe/Ausgabemuster linear untrennbar ist. 

13. Lernverfahren nach Anspruch 5, das femer die Schritte aufweist: 

Bestimmen etnas Lernmusters gemaB einem vorbestimmten ProzeB aus einem ursprunglichen Eingabe/Aus- 
gabemuster und einem Eingabe/Ausgabemuster, das durch die versteckten Neuronen realisiert ist, an denen 
eriemt wurde; und 

Zuordnen des Lernmusters zu den versteckten Neuronen, an denen nicht eriernt wurde. 

14. Lernverfahren nach Anspruch 5, wobei das Muster transfomriiert wird durch Transformieren eines AUS-Musters, 
dessen Sollausgabe 0 ist. In ein EIN-Muster, dessen Sollausgabe 1 ist, und Modrfizieren des AUS-Musters in ein 
EIN-Muster, wenn alle Elemente der AUS-Muster einen Wert haben. der gleich oder groBer als der Wert der ent- 
sprechenden Elemente eines EIN-Musters ist. 

15. Lernverfahren nach Anspruch 5, wobei die Mustermenge extrahiert wird durch Prufen einer Anderung der Zuord- 
nungskoeffizienten zwischen den versteckten Schichten und der Eingabeschicht, und Extrahieren einer Muster- 
mnge, In der die Zuordnungskoeffizienten sich nicht andem und die Sollausgabe mit der tatsachlichen Ausgabe 
nicht ubereinstimmt. 

1 6. Lernverfahren nach Anspruch 8, wobei die Beendigungsbedingung entweder auf einem Wert einer vorbestimmten 
Zielfunktion mit einem die Zuordnungskoeffizienten aufweisenden Argument oder auf einem Hypogradienten der 
Zielfunktion beruht. 

17. Lernverfahren nach Anspruch 5. wobei die Zuordnungskoeffizienten zwischen der Ausgabeschicht und den er- 
zeugten versteckten Neuronen so bestimmt werden, daB sie abwechselnde Vorzeichen haben, um die erzeugten 
versteckten Neuronen zu erzeugen. 



Revendications 

45 

1 . Reseau neuronal comportant : 

une couche d'entree ayant une plurality de neurones d'entree pour recevoir un signal d'entree, ledit signal 
d'entree contenant un ensemble de mcdeles d'entree/sortie (P). ladite pluralite de neurones d'entree compor- 
tant un nombre fixe predetermine de neurones d'entree, 

une ou plusieurs couches cachees ayant un ou plusieurs neurones caches pour traiter un signal re^u depuis 
ladite pluralite de neurones d'entree, et 

une couche de sortie ayant un ou plusieurs neurones de sortie pour traiter un signal regu depuis lesdits neu- 
rones caches et pour produire un signal de sortie, 

couche d'entree et lesdites couches cachees etant reliees les unes aux autres par des coefficients 
d'association (Wjj) determines par un processus d'apprentissage, 

lesdites couches cachees et ladite couche de sortie etant reliees les unes aux autres par des coefficients 
d'association (W||) determines de sorte qu'une somme de coefficients d'association entre lesdits neurones de 
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couche de sortie et des neurones caches sdquentiels ayant des num^ros impairs a partir du num^ro 1 dans 
une couche cachee unique est plus grande qu'un seuil pr§ddternnin§, et de sorte qu'une somme de coefficients 
d'asspciation entre tesdits neurones de sortie et des neurones caches sequentiels ayant des numeros pairs 
k partir du numdro 1 dans la couche cach6e unique est plus petite que ledit seuil predetermine. 
. . le r^seau neuronal g^n^rant automatlquement les neurones caches de ladite couche cachee unique tel que 
cela est n^cessaire conform^ment ^ un processus predetermine d mesure que ledit processus d'apprentissage 
progresse, 

dans lequel ledit ensemble de modeles d'entree/sortie donne est mis en oeuvre par une combinaison d'en- 
. sembles de modeles lineairement s^parables realises. par lesdits neurones caches, chaque neurone cache 
correspondant k un module respectit desdits modetes lineairement separabies 

ledit reseau neuronal etant caracterise en ce que ledit processus d'apprentissage est effectue en modifiant 
des coefficients d'assoctation (W^) entre lesdites couches cache'es et ladite couche d'entree conformement 
k la formule 

oij 

d est un indice representatif d'uh nombre de calculs repetes, et 
(X, un coefficient de decalage, est indique par 

i 

Wd §tant des poids. WeR" ' 
X etant le modele d'entree ou I'ensemble de vecteurs d'entree 
T indiquant une transposition, et ' 
* e etant une valeur de seuil de neurone,. et eeR^ 
si les sorties cible et reelle sont differentes Tune de I'autre, 

en extrayant un ensemble de modeles k partir de tous les modeles donnes conformement a un processus 
d'extraction d'ensemble de modeles predetermines, et 

en transformant un par un des modeles entrainant une nature inseparable seiectionnee parmi ledit.ensemble 
extrait de modules confomnement ^ une regie predeterminee en des modeles de type different,, le processus 

d'apprentissage etant effectue a nouveau jusqu'a ce qu'une certaine condition de finition soit satisfaite, de 
maniere a obtenir.en fin de compte un ensemble de modules lineairement separabies, 

en determinant lesdits coefficients d'associations entre ladite couche de sortie et lesdits neurones caches 
generes de maniere k realiser ledit ensemble de modeles d'entree/sortie donne par I'lntermediaire d'una com- 
binaison d'ensembtes de modeles lineairement separabies realises par lesdits neurones caches. 

Reseau neuronal selon la revendication 1 . dans lequel ladite couche ou lesdites plusieurs couches cachees com- 
portent des neurones caches, les neurones caches dans une couche cachee de numero inipair par rapport k ladite 
couche d'entree etant generes lorsque cela est necessaire k mesure que ledit processus d'apprentissage pro- 
gresse. et les neurones caches dans une couche cachee de numero pair par rapport k ladite couche d'entree sont 
fournis a I'avance en tant que neurones de sortie par rapport aux neurones caches de ladite couche cachee de 
numero impair. 

Reseau neuronal selon la revendication 1 , comportant de plus un neurone de decalage relie k chacun desdits 
neurones caches, ledit neurone de decalage etant en continu dans un etat passant torsqu'un signal d'entree destine 
k rendre passant ledit neurone de decalage Qst applique. 

R6seau neuronal selon la revendication 1 , dans lequel lesdits neurones caches sont divises en pluralite de groupes, 
des neurones respectifs parmi lesdits neurones de sortie etant foumis en etant relies a des neurones respectifs 
des groupes- des neurones caches ayant pour resultat un reseau k multiples entrees et multiples sorties. 

Procede d'apprentissage pour un reseau neuronal comportant une couche d'entree ayant une pluralite de neurones 
d'entree pour recevoir un signal d'entree, ledit signal d'entree contenant un ensemble de modules d'entree/sortie 
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(P), une ou plusieurs couches cachees ayant un ou plusleurs neurones caches pour trailer un signal re9U depuis 
ladite pluralite de neurones d'entree, et une couche de sortie ayant un ou plusieurs neurones do sortie pour traiter 
un signal re9u depuis lesdits neurones caches et pour produire un signal de sortie, le proced6 d'apprentissage 
comportant les etapes consistant a : 

determiner si ledit nnodele d'entree/sortie donne est lineairennent separable ou non, 

appllquer des valeurs d'entree depuis ledit module d'entree/sortie dans ladite couche d'entree et une valeur 
de sortie correspondante depuis ledit nnodele d'entree/sortie dans lesdits neurones caches pour effectuer un 
processus d'apprentissage predetermine sur lesdits neurones caches, si ledit ensemble donne de modeles 
d'entree/sortie est lineairement separable, 

determiner des coefficients d'association entre ladite couche de sortie et lesdites couches cachees de sorte 
qu'un signal provenant desdits neurones caches et un signal de sortie provenant desdits neurones de sortie 
qui refoivent ledit signal provenant desdits neurones caches sont egaux I'un a I'autre, 

disposer un ensemble de modeles d'apprentissage, determine par un processus de determination d'ensemble 
de modeles d'apprentissage predetermine, entre lesdites couches cachees et ladite couche d'entree pour 
effectuer ledit processus d'apprentissage sur lesdits neurones caches, si ledit ensemble donne de modeles 
d'entree/sortie est lineairement inseparable, 

generer les neurones cachees d'au moins une desdites couches cachees en fonction dudit ensemble donne 
de modeles d'entree/sortie conformement a un processus predetermine a mesure que ledit processus d'ap- 
prentissage progresse, chaque neurone cache correspondant ^ un ensemble respectif desdits ensembles de 
modeles lineairement separables, 

ledit precede d'apprentissage etant caracterise en ce que ledit processus d'apprentissage est effectue en 
modifiant les positions d'association entre lesdites couches cachees-et ladite couche d'entree conformement 
a la formule 

ou 

d est un indice representatif d'un nombre de calcuis repetes, et 
a un coefficient de decalage, est indique par 

i 

w<=*, w«*+^ etant des poids, WGR" 

X etant le modele d'entree ou I'ensemble de vecteurs d'entree 

T indiquant une transposition, et 

0 etant une valeur de seuil de neurone, et 0GR^ 

si les sorties cible et reelle sont differentes Tune de I'autre, 

en extrayant un ensemble de modeles a partir de tous les modeles donnes conformement a un processus 
d'extraction d'ensemble de modeles predetermines, et 

en transfonmant un par un des modeles entrainant une nature inseparable selectionnee parmi (edit ensemble 
extrait de modeles conformement a une regie predeterminee en modeles de type different, le processus d'ap- 
prentissage etant effectue a nouveau jusqu'a ce qu'une certaine condition de finition soit satisfaite, de maniere 
a obtenir en fin de compte un ensemble de modeles lineairement separables, 

determiner lesdits coefficients d'association entre ladite couche de sortie et lesdits neurones caches generes 
de maniere a realiser ledit ensemble donne de modeles d'entree/sortie par I'intermediaire d'une combinaison 
d'ensembles de modeles lineairement separables realises par. lesdits neurones caches. 

Procede d'apprentissage selon la revendication 5, dans lequel ladite regie predeterminee est definie pour selec- 
tionner un modele le plus eloigne de Torigine d'un espace de coordonnees dans lequel le modele est presente. 

Procede d'apprentissage selon la revendication 5, dans lequel ladite regie predeterminee est definie pour selec- 
tlonner un modele dont le produit interieur par le poids d'un neurone cache predetermine est maximal ou minimal. 

Procede d'apprentissage selon la revendication 5, comportant de plus les etapes consistant a : 
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appliquer un module d'entree auxdits neurones caches pour determiner si une sortie desdits neurones de 
sortie correspond a la sortie cible ou non, at 

corriger les coefficients d'association entre lesdits couches cach6es et ladlte couche d'entree jusqu'^ ce qu'une 
condition de finltion pr6deternnin6e solt satisfaite, si la sortie desdits neurones de sortie ne correspond pas h 
la sortie cible.. 

9. Precede d'apprentissage selon la revendication 8, clans lequel ladite condition de finition est satisfaite lorsque 
lesdits coefficients d'association sont corriges un nombre de fois qui depasse une valeur pred^ternninee. 

10. Proc6de d'apprentissage selon la revendication 8, dans lequel ladite condition de finition est basee sur une com- 
paraison entre un poids present et un poids qui satisfait k une condition predeterminde au moins une fois dans le 
pass§, lorsque des calculs rep^titifs sont effectu^s dans ledit processus d'apprentissage. 

11. Proc6d6 d'apprentissage selon la revendication 6, dans lequel lesdits coefficients d'association sont corriges par 
une largeur de pas qui est determin^e conform6ment k une regie de correction prSdetemninde k chaque fois que 
lesdits coefficients sont k corriger. 

12. Precede d'apprentissage selon la revendication 5, dans lequel ledit processus d'apprentissage est effectue en 
utilisant differentes fonctions de transformation h^y par rapport a un module ON dpnn6 dont la sortie cible est 
1 et un modele OFF donne dont la sortie cible est 0, en rendant maximal une somme des fonctions de transfor- 
mation par rapport k tpus les modeles, et en transformant liBdit modele d'entree/sortie en un module lineairement 
separable sur la base d'une equation de condition d'optimalisation pour rendre maximale la somme des fonctions 
de transformation; si ledit module d'entr6e/sortie est lindairement inseparable. 

1 3. Procede d'apprentissage selon la revendication 5. comportant de plus les etapes conststant k : 

determiner un module d'apprentissage conformdment ^ un processus predetermine k partir d'un moddle d'en- 
tree/sortie d'origine et un modeie d'entree/sortie realise par les neurones caches qui ont subi un apprentissage, 
et * • 

disposer le modeie d'apprentissage dans des neurones caches qui n'ont pas subi d'apprentissage. 

14. Procede d'apprentissage selon la revendication 5, dans lequel ledit modeie est transforme en transfonmant un 
modeie OFF dont la sortie cible est 0 en un module ON dont la sortie cible est 1, et en modifiant le modeie OFF 
en un modeie ON, lorsque tous les elements du modeie OFF ont une valeur egale ou supeheure k la valeur 
d'eiements correspondants d'un modeie ON. 

15. Procede d'apprentissage selon la revendication 5, dans lequel ledit ensemble de modeles est extrait en controlant 
un changement des coefficients d'association entre lesdites couches cachees et ladite couche d'entree, et en 
extrayant un ensemble de modeles dans lequel les coefficients d'association ne .changent pas et la sortie cible ne 
correspond pas avec la sortie reelle.. 

*16. Procede d'apprentissage selon la revendication 8, dans lequel ladite condition de finition est basee sur une valeur 
d'une fonction objet predeterminee ayant un argument comjsbrtant lesdits coefficients d'association, ou un hypo- 
, gradient de ladite fonction objet. 

17. Procede d'apprentissage selon la revendication 5, dans lequel lesdits coefficients d'association entre ladite couche 
de sortie et lesdits neurones caches generes sont determines de maniere a avoir un eigne alteme dans I'ordre de 
la generation desdits neurones caches generes. 
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FIG. 3 
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FIG. 7 




42 



EP 0.521 729 B1 



FIG. 8 
( START ) 



STEP I 



GIVE INPUT PATTERN Xp , 

OUTPUT RftTTERN tp ( p = I — > m ) 



f >t ) 



STEP 2 



CEZD 



STEP 3 



STEP 8 



k = k+l 



5) 



f ^TRANSFORMATrON 
^ ^l ROUTINE 

Ct'^"=ti-t'' ) 



STEP 5 



STEP 6 



STEP 7 




/"CONSTRUCT LISA NETWORK WITH 
THE WEIGHT OF THE SUM AND 
DIFFERENCE BETWEEN OUTPUT AND 
HIDDEN LAYERS . 

_ I . 

END 



43 



EP 0 521 729 B1 



FIG. 9 



C START ) 



( W =0 ) 



STEPH 



STEP 12 



STEP 21 

SELECT ONE 
REGfSTERED p 
WHICH IS 
OFF AS p' 



STEPT3 




y&s 



1 STEP-22 



yes 



STEP15_ 

VIBRATING 
^--^^r^--''' ^ISTER p f " 




STEP 16 



yes 



STEP 20 



/HIFnearly 
v inseparable j 



(LINEARLY SEPARABLE ) STEP 24 

^ I ^ ' 

• Cend) 



44 



EP 0 521 729 B1 



FIG. 10 




(b) (c) 



45 



EP 0 S21 729 B1 




46 



EP 0 521 729 B1 




47 



EP 0 521 729 B1 



fO 



to 



I 

Li. 

o 



o 

I 

q: 

CO 
txj 
O 



i 



ii 

3 

I 

o ; 



Oo 

o 

X 



is 



O 
CD 



X 



1- 

U. LJ 
Z 



£3 



q: 
p 



Z) 
2 



O 
CO 



o 



3 

in 
2:; 




i9 



UJ 
09 



CD 



CD 

i 



^ u: 
o o 
g ScS< 



>1M 



too 

CD CO CO 

F!:'><, 



oo 



O 
< 



o:: X 
o ^ 



^ >- tit 
CO ^ ^ 
Q O p 

z cj o 
Poo 



C9 Ui 

z _ 8 y ^ 

q: K -J 

< UJ < STqu 

. CO ^ ^ 00 
O 2 CO. 



z 



^ w*, 

Z o CL Ol ' » 




Z — 

o: 



48 



EP 0 521 729 B1 



FIG 

FLOWCHART OF ALGORITHM T 

"STARX: 



14 



SET INITIAL VALUES 



SET INPUT PATTERN VECTOR 
X (pj( i ] . p-1> -m. i=|, n 



SET TARGET OUTPUT 
t [IJCpJ. P^I, — m 



-: — 1 CALCULATION 

J-^^-2LoF-NORM^OF 





for m 



COHHbSPONOS TO 




PflflCfiSS FOR 
HiGHEH SPeSD 

No 




49 



EP 0 521 729 B1 



FIG. 15 



K 





r 


p:»l| 












XmOJ :» h(x/w - 6) 


YeTT 


---^ No 

1 



sum :^ (6 e) K x^fl) | 



Yes 




No 




( ^ign :« -1 i 
I 



SI 



I 



sum :« sum -•• sign x 2e x x^(jj 




. THERE IS 
WRONG ANSWER. 
sFAILURE FINISHED- 



50 



EP 0 521 729 B1 



F IG. 16 

ALGORITHM 1 1 




.CHECKING IF 

i:»J I Xp^Xq 



max Nr[p3T 



fACT OFF 

wrm 

XIMUM NORM 
FROM I lu 





p :« p+ 1 



I 



t[k+l)[p] :«t^ClcJCpJ-tCk][p] 
forp«l,...,m 



I 



CORRESPONDS TO 



kTr^TT) UPDATING OF ITERATION k 



51 



EP 0 521 7?9 B1 



FiG. 17 



FLOWCHART OF ALGORITHM 




52 



EP 0 521 729 B1 



FIG. 18 

xout 




FIG. 19 

TO OTHER NEURONS 



ON SIGNAL 



OFF SIGNAL 



53 



EP 0 521 729 B1 



FIG. 20 

^OUtt ^0Ut2 





54 



• 

V 



EP 0 521 729 B1 



FIG. 22 



Log 
sec 




NUMBER OF PATTERNS 



55 



EP 0 521 729 B1 



PEFCENTAGE OF 
CORRECT ANSWERS 



FIG. 23 



% 



ICQ 



8Q- 



60 



40 



20 



|+.e0lO65(Nr-72) 




. Back-Propagation 
o LISA 



25 



50 75 lOO j25 N 

NUMBER OF PATTERNS 



56 



